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Preface 



The importance of typed languages for building robust software systems is, by 
now, an undisputed fact. Years of research have led to languages with richly 
expressive, yet easy to use, type systems for high-level programming languages. 
Types provide not only a conceptual framework for language designers, but also 
afford positive benefits to the programmer, principally the ability to express and 
enforce levels of abstraction within a program. 

Early compilers for typed languages followed closely the methods used for 
their untyped counterparts. The role of types was limited to the earliest sta- 
ges of compilation, and they were thereafter ignored during the remainder of 
the translation process. More recently, however, implementors have come to re- 
cognize the importance of types during compilation and even for object code. 
Several advantages of types in compilation have been noted to date: 

— They support self-checking by the compiler. By tracking types during com- 
pilation it is possible for an internal type checker to detect translation errors 
at an early stage, greatly facilitating compiler development. 

— They support certification of object code. By extending types to the gene- 
rated object code, it becomes possible for a code user to ensure the basic 
integrity of that code by checking its type consistency before execution. 

— They support optimized data representations and calling conventions, even 
in the presence of modularity. By passing types at compile-, link-, and even 
run-time, it is possible to avoid compromises of data representation imposed 
by untyped compilation techniques. 

~ They support checked integration of program components. By attaching type 
information to modules, a linker can ensure the integrity of a composite 
system by checking compliance with interface requirements. 

Types in Compilation (TIC) is a recurring workshop devoted to the appli- 
cation of types in the implementation of programming languages. This volume 
consists of a selection of papers from the TIC 2000 Workshop held in Montreal, 
Canada in September 2000. The papers published herein were chosen from sub- 
missions solicited after the meeting by a rigorous refereeing process comparable 
in depth and scope to the selection criteria for an archival journal. Each paper 
was reviewed by at least three referees chosen from the TIC 2000 program com- 
mittee (named below), with final publication decisions made by the program 
chair. This volume represents the result of that review and revision process. 
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Sound and Complete Elimination of Singleton 

Kinds 



Karl Crary 

Carnegie Mellon University 



Abstract. Singleton kinds provide an elegant device for expressing type 
equality information resulting from modern module languages, but they 
can complicate the metatheory of languages in which they appear. I 
present a translation from a language with singleton kinds to one with- 
out, and prove that translation to be sound and complete. This transla- 
tion is useful for type-preserving compilers generating typed target lan- 
guages. The proof of soundness and completeness is done by normalizing 
type equivalence derivations using Stone and Harper’s type equivalence 
decision procedure. 



1 Introduction 

Type-preserving compilation, compilation using statically typed intermediate 
languages, offers many compelling advantages over conventional untyped com- 
pilation. A typed compiler can utilize type information to enable optimizations 
that would otherwise be prohibitively difficult or impossible. Internal type check- 
ing can be used to help debug a compiler by catching errors introduced into 
programs in optimization or transformation stages. Finally, if preserved through 
the compiler to its ultimate output, types can be used to certify that executables 
are safe, that is, free of certain fatal errors or malicious behavior PEI. 

For typed compilation to be practical, we require elegant yet expressive type 
theories for use in the compiler: expressive because they must support the full 
expressive power of a real source language, and elegant because they must be 
practical for a compiler to manipulate. One important issue arising in the design 
of such type theories for compiling Standard ML, Objective CAML, and similar 
languages is how to account for type abbreviations and sharing constraints in 
the module language. For example, consider the following SML signature: 

signature SIG = 
sig 

type t = int 
val x : t 
val f : t -> t 
end 

If S is a structure having signature SIG, the type theory must ensure that S.t 
is interchangeable with int in any code having access to S. 

R. Harper (Ed.): TIC 2000, LNCS 2071, pp. l-|2l 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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The standard account of sharing in type theory was developed independently 
by Harper and Lillibridge, under the name translucent sums EC3, and by Leroy, 
under the name manifest types UDI (and extended in Leroy El)- These type 
theories provide a facility for stating type abbreviations in signatures and (im- 
portantly) ensure the correct propagation of type information resulting from 
those abbreviations. (Exactly what is meant by correct propagation is discussed 
in Section O) Translucent sums are employed in the type-theoretic definition of 
Standard ML given by Harper and Stone 0 (currently the only formal account 
of an entire practical programming language in type theory), and manifest types 
are similarly employed (somewhat less formally) by Leroy for Objective 
CAML. 

In this paper I consider a type theory based on singleton kinds IZH , a vari- 
ant of the translucent sum/manifest type formalism. The singleton kind calculus 
differs from the standard accounts in that it separates the module system from 
the mechanisms for type abbreviations and focuses on the latter. This separa- 
tion is appropriate, first, because the two issues are orthogonal (although they 
typically arise together in practice), but more importantly, because type abbre- 
viations persevere even after the compiler eliminates modules |3. Furthermore, 
separating modules from the issue of type propagation makes it unnecessary 
to compare types by name (as in the module-based accounts), which makes it 
possible to propagate more type information. (An example of this is given in 
Section O) 

Singleton kinds provide a very elegant and uniform type-theoretic mechanism 
for ensuring the propagation of type information. Kinds are used in type theories 
containing higher-order type constructors to classify type constructors just as 
types classify ordinary terms. Using singleton kinds, in the above example S.t 
is given the kind S'(int), the kind containing only the type int (and types 
equal to it). Propagation of type information is then obtained by augmenting 
the typechecker with the rule that if r has kind S{t'), then r = r'. 

When using singleton kinds in practice, the question arises of how singleton 
kinds affect typechecking, given that they provide a new (and conceivably diffi- 
cult to discover) way to show types to be equal. In fact. Harper and Stone m 
show that there exists a very simple algorithm for deciding equality of types in 
the presence of singleton kinds. Indeed, the algorithm is very nearly identical 
to the usual algorithm employed in the absence of singletons in practice (as op- 
posed to the less-efficient algorithms often considered in theory). In this sense, 
singleton kinds complicate the compiler very little. 

Nevertheless, there are some good reasons why one may want to compile 
away singleton kinds: Although the decision algorithm discussed above is sim- 
ple, its proof of correctness is quite complex, and may be difficult to extend 
to more complicated type systems. (The complexity of this proof is probably 
the source of the common misconception that singleton kinds make typecheck- 
ing difficult.) The latter phases of a type-preserving compiler may involve some 
very complicated type systems indeed Umi2D|. Extending Stone and Harper’s 
proof to these type systems, some of which already have nontrivial decidability 
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proofs, is a daunting prospect. Moreover, there already exist a variety of tools 
for manipulating low-level typed languages that, by and large, do not support 
singleton kinds. 

In this paper, I present such a strategy for compiling away singleton kinds. 
To implement the source language correctly, this elimination strategy should 
be sound and complete relative to the singleton calculus, that is, two types 
should be equal in the singleton calculus if and only if they are equal after 
singleton elimination. This means that the elimination process does not cause 
any programs to cease to typecheck, nor does it allow any programs to typecheck 
that would not have beforeu 

The compilation process is based on the natural idea of substituting defini- 
tions for any appearances of variables having singleton kinds. However, how to 
do this in a sound and complete manner is not obvious because, as discussed 
below in Section 13 . 1 1 in the presence of internal bindings, it is difficult to deter- 
mine whether or not a variable has a singleton kind. Although I show this issue 
can be handled elegantly, as with Stone and Harper, the correctness proof is not 
obvious. This proof is the central technical contribution of the paper. 

The existence of a sound and complete compilation strategy does not imply 
that singleton kinds are useless. They provide an extremely elegant and succinct 
account of Mb’s type sharing that (with modules taken out of the picture) is 
essentially equivalent to the standard type-theoretic accounts employed to ex- 
plain practical source languages. To exploit this result and remove singletons 
from consideration entirely (in the absence of some alternative) would require 
programmers to eliminate type abbreviations by hand, resulting in verbose, un- 
readable code (to no particular benefit). Moreover, singleton kinds may also 
be useful for some other purposes such as compression of type information, or 
polymorphic closure conversion 0 

What this result does mean is using translucent sums, manifest types or 
singleton kinds to express sharing in the source language need not constrain the 
compilation strategy. One may use singleton kinds through as many compilation 
phases as desired, and then compile them away and proceed without them. For 
example, a reasonable architecture is to use singleton kinds in the compiler’s 
front end (which performs ML-specific optimizations and transformations), but 
not in the back end (which may use complicated type systems for code generation 
and low-level transformations). 

This paper is organized as follows: In Section |3 I formalize the singleton 
kind calculus and discuss some of its subtleties that make it complicated to 
work with. In Section 01 I present the singleton elimination strategy and state 
its correctness theorem. Section 0 is dedicated to the proof of the correctness 
theorem, and concluding remarks appear in Sectional 



^ It may be argued that only the former property is essential to implement the source 
language correctly, that it is acceptable to allow more programs to typecheck pro- 
vided that the post-translation type system is still sound. Nevertheless, the latter is 
still a desirable property. 
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kinds K 

constructors c 

assignments F 



T I S{c) I na-.K1.K2 I Sa-.K1.K2 
a\b \ Xa-.K.c \ C1C2 \ (ci,C2) \ 

TTlC I 7T2C 

e I r, a-.K 



Fig. 1. Syntax 



This paper assumes familiarity with type systems with higher-order type 
constructors and dependent types. The correctness proof draws from the work 
of Stone and Harper izn showing decidability of type equivalence in the presence 
of singleton kinds, but we will use their results almost entirely “off the shelf,” 
so familiarity with their paper is not required. 



2 A Singleton Kind Calcnlus 

We begin by formalizing the singleton calculus that is the subject of this paper. 
The syntax of the singleton calculus is given in Figure It consists of a class of 
type constructors (usually referred to as “constructors” for brevity) and a class 
of kinds, which classify constructors. The class of constructors contains variables 
(ranged over by a), a collection of base types (ranged over by b), and the usual 
introduction and elimination forms for functions and pairs over constructors. 
We could also add a collection of primitive type operators (such as list or ->) 
without difficulty, but have not done so in the interest of simplicity. 

The kind structure is the novelty of the singleton calculus. The base kinds in- 
clude T, the kind of all types, and 5'(c), the kind of all types definitionally equal 
to c. Thus, S'(c) represents a singleton set, up to definitional equality. The con- 
structor c in S{c) is permitted to be open, and consequently kinds may contain 
free constructor variables, which makes it useful to have dependent kinds. The 
kind IIa-.K1.K2 contains functions from Ki to K2^ where a refers to the func- 
tion’s argument and may appear free in K2. Analogously, the kind Sa-.K1.K2 
contains pairs of constructors from Ki and K2, where a refers to the left-hand 
member and may appear free in K2. As usual, when a does not appear free in 
K2, we write IIa-.K1.K2 as ATi — >■ K2 and Sa-.K1.K2 as Ki x K2. 

In addition, the syntax provides a class of assignments, which assign kinds to 
free constructor variables, for use in the calculus’s static semantics. In a practical 
application, the language would be extended with an additional class of terms, 
but for our purposes (which deal with constructor equality) we need not be 
concerned with terms, so they are omitted. 

As usual, alpha-equivalent expressions (written E = E') are taken to be 
identical. The capture-avoiding substitution of c for a in E (where if is a kind, 
constructor or assignment) is written E{c/a}. We also will often desire to define 
substitutions independent of a particular place of use, so when cr is a substitution, 
we denote the application of cr to the expression E by E{u}. Separately defined 
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signature SIG2 = 
sig 

type s 

type t = int 
type u = s * t 
. . . value fields . . . 
end 

funsig FSIG (S : sig 

type s 

. . . value fields . . . 
end) = 

sig 

type t 

type u = S . s * t 
. . . value fields . . . 
end 



Fig. 2. Sample Signatures 



substitutions will usually be written in the form {ci/oi} • • • {c„/a„}, denoting 
a sequential substitution with the leftmost substitution taking place first. 

As discussed in the introduction, the principal intended use of singleton kinds 
is in conjunction with module systems. For example, the type portion of signature 
SIG2 in Figure El is translated to the kind: 

Sa:T. Sf3:S{±nt). S{a*P) 

Note the essential use of dependent sums in this kind. Dependent products arise 
from the phase splitting | 7 ] of functors, in which the static portion of a functor 
(i.e., its action on types) is separated from the dynamic portion. For example, 
after phase-splitting, the type portion of the functor signature FSIG in Figure 0 
(given in the syntax of Standard ML of New Jersey version 110) is translated to 
the kind: 

na:T. {Sf3:T. S{a*(3)) 



2.1 Judgements 

The inference rules defining the static semantics of the singleton calculus are 
given in Appendix E] For the reader’s convenience, the rules are given in the 
same order and essentially the same form as in Stone and Harper EH- A summary 
of the judgements that these rules define, and their interpretations, are given in 
Figure El The context and kind equality judgements are auxiliary judgements 
used in theorems but not by any of the other judgements. For the most part, 
the static semantics consists of the usual rules for a dependently typed lambda 
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Judgement Interpretation 

_T h ok 7 ^ is a valid assignment 

h A = ^2 A and A are equivalent assignments 

r \- K A is a valid kind 

r h Ki < K2 Ki is a subkind of K2 

r h Ki = K2 Ki and K2 are equivalent kinds 

r \- c : K c is a valid constructor with kind K 

r \- Cl = C2 ■ K Cl and C2 are equivalent as members of kind K 



Fig. 3. Judgement Forms 



calculus with products and sums (but lifted to the constructor level) . Again, the 
novelty lies with the singleton kinds. Singleton kinds have two introduction rules 
(one for kind assignment and one for equivalence), 

r'r c:T r'r c=c' -.T 

r^c: S{c) r^c=c' : S{c) 

and one elimination rule: 

A h c : S'(c') 
r h c = c' : T 

These rules capture the intuition of singleton kinds: The first says that any type 
belongs to its own singleton kind. The second says that equivalent types are also 
considered equivalent as members of their singleton kind. The third says that if 
one type belongs to another’s singleton kind, then those types are equivalent. 

The complexity of the singleton calculus arises from the above rules in con- 
junction with the subkinding relation generated by the following two rules: 

Ahc:r r h Cl = C2 : r 

A h S{c) <T Ah S'(ci) < S'(c2) 

These rules are essential for singleton kinds to serve their intended purpose in a 
modern module system. The first allows a signature to match a supersignature 
obtained by removing equality specifications. For example, structures having the 
signature SIG from the introduction should also match the signature obtained 
by replacing the specification “type t = int” (which we might write type t : 
S'(int)) with simply “type t” (which we might write type t : A). The second 
allows a signature to match another signature obtained by replacing equality 
specifications with different but equivalent ones. 

The presence of subkinding makes the usual context-insensitive methods of 
dealing with equivalence impossible. Consider the identity function, Xa:T.a, and 
the constant int function, Ao;:A.int. These functions are clearly inequivalent as 
members of A — >■ A; that is, the judgement h Xa:T.a = Aa:T.int : A — >■ A is 
not derivable. However, since A — >■ A is a subkind of S'(int) — >■ A, these two 
functions can also be compared as members of S'(int) — >■ A and in that kind 
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they are equivalent. This is because the bodies a and int are compared under 
the assignment a:S'(int), under which a and int are equivalent by the single- 
ton elimination rule. This example makes it clear that to deal with constructor 
equivalence in the singleton calculus, one must take into account the contexts in 
which the constructors appear. 

The determination of equivalence is further complicated by the fact that the 
classifying kind may be given implicitly. For example, the classifying kind may 
be imposed by a function on its argument. Consider the constructors f3{Xa:T.a) 
and P{Xa:T.±nt). These are well-formed under an assignment giving j3 the kind 
{T ^ T) ^ T and also under one giving [3 the kind (S'(int) T) ^ T. 
However, for the same reason as above, the two constructors are equivalent 
under the second assignment but not the first 0 The classifying kind can then be 
made even further remote by making (3 a function’s formal argument instead of 
a free variable, and so on. 

2.2 A Singleton-Pree System 

To formalize our results, we also require a singleton-free target language into 
which to translate expressions from the singleton calculus. We will define the 
singleton-free system in terms of its differences from the singleton calculus. 

We will say that a constructor c (not necessarily well-formed) syntactically 
belongs to the singleton-free calculus provided that c contains no singleton kinds. 
Note that as a consequence of containing no singleton kinds, all product and sum 
kinds may be written in non-dependent form. Also, all kinds in the singleton-free 
calculus are well- formed. 

The inference rules for the singleton-free system are obtained by removing 
from the singleton calculus all the rules dealing with subkinding (Rules EFEI SSI 
a,nd l45|) and all the rules dealing with singleton kinds (RulesElElESIEIandESI)- 
Note that derivable judgements in the singleton-free system must be built using 
only expressions syntactically belonging to the singleton-free calculus. When a 
judgement is derivable in the singleton-free system, we will note this fact by 
marking the turnstile fy/. 

3 Elimination of Singleton Kinds 

The critical rule in the static semantics of the singleton calculus is the single- 
ton elimination rule (Rule 14411 . The main aim of the singleton kind elimination 

^ As an aside, in the module-based accounts mmm it is impossible to discover 
that the module analogues of these types are equal because comparisons can be made 
only on expressions in named form. Naming the expressions Xa:T.a and AaiT.int 
obscures the possible connection between them, which depends essentially on their 
actual code. (In the first-class account of Harper and Lillibridge prm) this is essential 
because the equality may not hold — in addition to being impossible to discover — 
since a functor can inspect the store before deciding what type to return.) This is an 
example of when the singleton kind account can propagate more type information 
than the module-based accounts. 
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rj^O rj^ 

S{c)° T 

{na:Ki.K2)° = Ki° ^ K2° 
{Ea-.K 1 .K 2 Y KY X K2° 

Fig. 4. Singleton Erasure 



process is to rewrite constructors so that any equivalences that hold for those 
constructors may be derived without using that rule. If this aim is achieved, any 
singleton kinds remaining within the constructors are not used (in any essential 
way) and can simply be erased, resulting in valid constructors and derivations 
in the singleton-free system. 

This erasure process is made precise in Figure 0 which defines a mapping 
(— )° from singleton calculus kinds to singleton-free kinds that replaces all single- 
ton kinds by T. The erasure mapping is lifted to constructors and assignments 
in the obvious manner, li F \- c\ = '. K \s derivable without using singleton 

elimination, then F° hs/ ci° = C 2 ° : K° is derivable in the singleton-free system. 
A slightly stronger version of this fact is formalized as Lemma El in Section l4.;-a 

Thus, our goal is to rewrite constructors in such a manner that the singleton 
elimination rule is not necessary. As mentioned in the introduction, this rewriting 
is done by substituting definitions for variables whenever singleton kinds provide 
such definitions. This works out quite simply in first-order cases, but higher- 
order cases raise some subtle issues. We will explore these issues by considering 
a number of examples before defining the fully general elimination process. 

Example 1. Suppose we are working under the assignment a:S'(int), /3:S'(bool). 
Naturally, we replace all free appearances of a in the constructor in question 
by int, and replace all free appearances of j3 by bool. This is done simply by 
performing the substitution {bool//3}{int/a} on the constructor in question. 

In this example, we refer to int as the expansion of a, and likewise bool is 
the expansion of j3- In general, the elimination process will have the same gross 
structure as in this example. For an assignment F = ai'.Ki , . . . , an'.Kn we will 
define a substitution R{F) of the form {c„/a„} • • • {ci/oi} where each Ci is the 
expansion of Oi. 

Example 2. Suppose we are working under the assignment F = 
a:«S'(int),/3:5'(a). In this case, analogously to the previous example, R{F) 
is {a//3}{int/o;}. Note that since this is a sequential substitution, it is 
equivalent to the substitution {int//3}{int/o;}, as one would expect. 

Example 3. Suppose a is assigned the kind S'(int) x S'(bool). In this case, ttio 
is equal to int and 7T2a is equal to bool. We can write these equalities into a 
constructor by substituting for a with the pair (int, bool). 
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Example 4- In the previous examples, the expansion of a variable a did not 
contain a, but this is not true in general. Suppose a is assigned the kind T x 
S'(int). In this case, 7T2a is equal to int, but ttiq; is not given a definition 
and should not be changed. We handle this by substituting for a with the pair 
(TTia, int). 

As this example illustrates, a good way to understand expansions is to view 
them as eta-long form^of constructors. This interpretation is precisely correct, 
provided we view the replacement of a constructor by its singleton definition as 
an eta-expansion. In fact, the ultimate definition of expansions will eta-expand 
constructors uniformly, so, for example, if a has kind T xT, its expansion will be 
(7ria,7T2a) (instead of just a). This uniformity will make the correctness proof 
simpler, but a practical implementation would probably optimize such cases. 

Example 5. Suppose a is assigned the kind Ej3:T.S{l3). Then tt 20 : is known to be 
equal to wia (although its precise value is unknown). In this case the expansion 
of a is (ttio, TTia). 

Example 6. Suppose a is assigned the kind A'/3:S'(int).S'(/3). In this case ttio 
and 7T2a are equal to int and the expansion is (int, int). 

Generally, if a has the kind Sf3:Ki.K2, the expansion of a will be the pair 
(ci,C 2 ) where C\ is the expansion of TTia, and C 2 is the expansion of 7T2a with 
the additional information that (3 refers to ttio and has kind Ki. We may gen- 
eralize all the examples so far with the following definition, where i?(c, K) is the 
expansion of c assuming c is known to have kind K: 

R{c, T) c 
R(c,S(c')) =^c' 

R(c, Ea:Ki.K 2 ) (i?(7Tic, ATi), 

R{n 2 C, K 2 {R{ttiC, Ki)/a}) 

Example 1. Suppose a is assigned the kind El f3:T.S {list P) (where list : T — ?> 
T). Then for any argument c, the application ac is equal to listc. Thus, the 
appropriate expansion of a is A/3:T.list /3. Note that this is the eta-long form 
of list. 

Example 8. Suppose a is assigned the kind Up-.T. {T x S{P)). In this case, for 
any argument c, 7T2(a c) is known to be equal to c, but no definition is given for 
7Ti(ac). Thus, the expansion of a is XP:T.{TTi{a P), P). 

These last two examples suggest the following generalization for product 
kinds: 

R{c, na:Ki.K 2 ) = Xa:Ki. R{ca, K 2 ) (wrong) 

This is close to the right generalization, but, as we will see in the next section, 
it is not quite satisfactory due to the need to account for bound variables. Nev- 
ertheless, it provides good intuition on the process of expansion over product 
kinds. 

^ That is, beta-normal forms such that no eta-expansions can be performed without 
creating beta-redices. 
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3.1 Bound Variables 

Thus far we have exclusively considered rewriting constructors to account for the 
kinds of their free variables. To be sure that no uses of the singleton elimination 
rule are necessary, we must also consider bound variables. For example, it would 
seem as though the constructor Aa:S'(int).a should be rewritten to something 
like Aa:S'(int).int. 

A naive approach would be to traverse the constructor in question and re- 
place every bound variable with its expansion resulting from the kind in its 
binding occurrence. For example, in Aa:S'(int).a, the binding occurrence of a 
gives it kind S'(int), so the a in the abstraction’s body would be replaced by 
i?(a, S'(int)) = int. However this traversal is not sufficient to account for all 
bound variables, nor in fact is it even necessary. 

To see why a traversal is insufficient, suppose j3 has kind (S'(int) T) ^ T 
and consider the constructors P{Xa:T.a) and /3{Xa:T. int). (Recall Section 
In the former constructor, the binding occurrence of a gives it kind T, and 
consequently the hypothetical traversal would not replace it. However, as we 
saw in Section o the two constructors should be equal, and for this to happen 
without the singleton elimination rule, a must be replaced by int in the former 
constructor. What this illustrates is that when an abstraction appears in an 
argument position, the abstraction’s domain kind can sometimes be strengthened 
(in this case from T to 5'(int)). This means that the kind given in a variable’s 
binding occurrence cannot be relied upon. 

One possibility for dealing with this would be to perform a much more com- 
plicated traversal that attempts to determine the “true” kind for every bound 
variable. Fortunately, we may deal with this in a much simpler way by shifting 
the responsibility for expanding a bound variable from the abstraction where 
that variable is bound to all constructors that might consume that abstraction. 

In the above example, f3 changes the effective domain of its arguments to 
5'(int); in other words, it promises only to call them with int. The expansion 
process for product kinds makes this explicit. In this case, the expansion of (3 is 
A 7 :(S'(int) — >• T). /3{Xa:S (int). j int). After substituting this expansion for f3, 
each of the constructors above normalizes to /3(Aa:5'(int).int). This can again 
be seen as an eta-long form for (3 where replacement of a variable by its definition 
is considered an eta-expansion. 

In general, the expansion that achieves this is: 

R{c,na-.Ki.K 2 ) = Xa-.Ki.R{ca,K 2 ){R{a,Ki)/a} 

Making this expansion part of the substitution for free variables accounts for 
all cases in which the kind of an abstraction (and therefore its domain kind) 
is given by some other constructor to which the abstraction is passed as an 
argument. The only other way a kind may be imposed on an abstraction is at 
the top level. Again recall Section ITTl a.nd consider the constructors Xa:T.a and 
Aa:T.int. These constructors should be considered equivalent when compared 
as members of kind S'(int) — >■ T, but not as members of T — >■ T. Thus, the 
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R{c, T) = c 
R{c, S(c')) = c' 

Hof 

R{c,na:Ki.K2) = Xa:Ki. R{cR{a, Ki), K2{R{a, Ki)/a}) 
(where a is not free in c or Ki) 

R{c,Ea-.Ki.K2) {R(tvic, Ki), R{tv2C, K2{R(-kic, Ki)/a}} 
R{ai-.Ki, an-.K„) K„)/q„} ■ • ■ Ki)/ai} 



Fig. 5. Expansions 



elimination process must be affected by the kinds in which a constructor is 
considered to lie. 

This is neatly dealt with by (in addition to substituting expansions for 
free variables) expanding the entire constructor using the kind to which it be- 
longs. Thus, when considered as members of 5'(int) — ?► T, the two constructors 
above become Ao;:S'(int).((AQ;:T.a)int) and Aa:S'(int).((Aa:T.int)int); each 
of which normalize to Aa:5'(int).int. However, when considered as members 
of T — >■ T, the two become \a\T .{{\a\T .a)a) and Aa:T.((Aa:T.int)a); each of 
which normalizes to its original form. 

It is worth noting that the required top-level expansion adds very little com- 
plexity to the use of singleton elimination in practice. In this paper we have 
largely ignored the term-level constructs of the intermediate language in ques- 
tion, but, in fact, constructors lie within surrounding terms, and elimination of 
singleton kinds in constructors is part of an overall transformation on terms. 
Typically, constructors appearing within terms are simply types (the domain of 
a lambda, for example), and in such cases the top-level expansion has no effect 
at all (since R{c,T) = c). In other cases constructors may by considered to lie 
in more interesting kinds (such as with the argument to a constructor abstrac- 
tion), but in all such cases the intended kind is clearly given by context and the 
top-level expansion is still easy to perform. 



3.2 The Elimination Process 

The full definition of the expansion constructor^ and substitutions is given in 
Figure0 Using expansion, the singleton kind elimination proceeds in three steps: 
Given a constructor c considered to have kind K under assignment F, we first 
expand c, resulting in R{c,K). Second, we substitute expansions for all free 
variables, resulting in R{c, K){R(F)}. Third, we erase any remaining singleton 
kinds, resulting in {R{c, K){R{F)})° . This elimination process is easily seen to 
be terminating, since R is defined by induction over the structure of kinds. 

Expansion of constructors is shown to be well-defined by induction on the structure 
of the kind, ignoring the contents of singleton kinds. 
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We may state the following correctness theorem for the elimination process, 
which states that rewritten constructors will be equivalent if and only if the 
original constructors were equivalent: 

Theorem 1. Suppose F \- ci : K and F \- C2 '■ K . Then F \- c\ = C2 '■ K if and 
only ifF° b/ {R{cuK){R{F)})° = {R{c2, K){R{F)})° : K° . 

The proof of the correctness theorem is the subject of the next section. 

4 Correctness Proof 

The previous section’s informal discussion motivates why we might expect the 
elimination process to be correct. Unfortunately, Theorem ^ defies direct proof, 
because there are too many ways that a judgement might be derived, and those 
derivations have no particular structure in common. We may see a reason why 
the proof is difficult by considering the theorem’s implications. Since it is easy 
to determine equality of constructors in the singleton-free system, the theorem 
provides a simple test for equality: translate constructors into the singleton-free 
system and check that they are equal there. The theorem states that such a test 
is sound and complete. However, this also indicates that proving the theorem 
is at least as difficult as proving decidability of constructor equality in the full 
system. 

The decidability of constructor equality has recently been shown by Stone 
and Harper They provide an algorithm for deciding constructor equality and 
prove that algorithm sound and complete using a Kripke-style logical relation. 
In addition to settling the decidability question, they provide a tool with which 
we may prove Theorem ^ One approach would be to follow Stone and Harper 
and prove the theorem directly using a logical relation. This approach is not 
attractive, due to the substantial complexity of the arguments involved. However, 
we may still take advantage of their result. 

The proof works essentially by using Stone and Harper’s algorithm to normal- 
ize the derivations of equality judgements. Given a derivable equality judgement, 
we use completeness of the algorithm to deduce the existence of a derivation in 
the algorithmic system. That derivation can have only one form, making it much 
easier to reason about. 

Due to space limitations, we do not present the entire proof here, and instead 
only present the key lemmas and definitions. The full details may be found in 
the companion technical report | 2 |. 

The only-if portion of the proof (the difficult part, as it turns out) is struc- 
tured as follows: 

1 . Suppose F \- Cl = C2 '■ K. 

2 . Prove that constructors are equal to their expansions; that is, F \- ci = 

R{ci, K){R{F)} : K and F \- C2 = R{c2, K){R{F)} : K. By symmetry and 

transitivity it follows that the expansions are equal: F h R{ci, K){R{F)} = 

R{c 2 ,K){R{F)}: K. 
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3. By algorithmic completeness, deduce that there exists a derivation of the 
algorithmic judgement F h R{c\, K){R{F)} ■. K F \- R{c 2 jK){R{F)} : 

K. 

4. Prove that singleton reduction (the algorithmic counterpart of the singleton 
elimination rule) is not used in the algorithmic derivation. This step is the 
heart of the proof. 

5. By algorithmic soundness, deduce that there exists a derivation of T h 
R{ci, K){R{F)} = R{c 2 , K){R{F)} : K in which the singleton elimina- 
tion rule (Rule I34II is not used (except within subderivations for binding or 
subkinding judgements) . 

6. Prove that therefore there exists a derivation of F° \-gf {R{ci, K){R{F)})° = 
{R{c2,K){R{F)})° : K°. 

Once the only-if portion is proved, the converse is easily established. The 
converse’s proof is discussed in Section lOl 

We begin by stating two lemmas that establish that well-formed constructors 
are equal to their expansions. These are each proven by straightforward induc- 
tions. It then follows by transitivity that when constructors are equal, so are 
their expansions. 

Lemma 1. If F \- c. K then T h c = i?(c, K) : K. 

Lemma 2. IfF'rc.K then T h c = R(c, K){R{F)} : K. 

Corollary 1. If F h ci = C 2 : RT then F h R{ci, K){R{F)} = R{c 2 , K){R{F)} : 
K 

4.1 The Decision Algorithm 

Stone and Harper’s decision algorithm for constructor equivalence is given in 
FigureEl This algorithm is unusual in that it is a six-place algorithm; it maintains 
two assignments and two kinds. This allows the two halves of the algorithm to 
operate independently, which is critical to Stone and Harper’s proof and to this 
one0 In common usage, the two assignments and the two kinds are equivalent 
(but often not identical). The critical singleton reduction rule appears as the 
ninth clause. 

The algorithm works as follows: 

1 . The algorithm is presented with a query of the fornU F \- c : K F' \- c' : 
K' . When \- F = F' and F \- K = K' , this determines whether F \- c= c' \ 
K is derivable. 

® Stone and Harper also prove their six-place algorithm equivalent to a conventional 
four-place algorithm employing judgements of the form F \- c\ C 2 ■ K, which is 
preferable in practice. 

® It is awkward to render six-place judgements in spoken language. My preferred 
rendering of the algorithmic judgement is “In assignments F and F' , c and c' are 
related at kinds K and A'.” 
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2. The constructor equivalence rules add appropriate elimination forms (ap- 
plications or projections) to the constructors being compared in order to 
drive them down to kind T or a singleton kind. Then those constructors are 
reduced to weak head normal form. 

3. Elimination contexts {E) are defined in the usual manner, as shown below. 
A constructor of the form E[a] is referred to as a path, and a is called the 
head of the path. We will often use the metavariable p to range over paths. 

E ::= [] I Ec \ ttiE \ tt2E 

A constructor is reduced to weak head normal form by alternating beta 
reductions and singleton reductions. Beta reduction of a constructor c is 
performed by placing it in the form E[c] where c is a beta redex, and re- 
ducing to E[c'] where d is the corresponding contractum. Repetition of this 
will ultimately result in a path (if the constructor is well-formed, which is 
assumed) . 

4. Singleton reduction of a path p is performed by determining its natural kind, 
and replacing p with c whenever p’s natural kind is some singleton kind S{c). 
(Formally, the algorithm adds an elimination context, reducing E[p] to E[c\ 
when p has natural kind c, but E will be empty when E[p] is well-formed.) 
Note that the natural kind of a path is not a principal kind. For example, if 
E{a) = T then the natural kind of a is T, but a has principal kind S{a). 

5. When no more beta or singleton reductions apply, the algorithm compares 
the two paths, checking that they have the same head variable and the same 
series of eliminations. When checking that two applications are the same, the 
main algorithm is reinvoked to determine whether the arguments are equal. 

We may state the following correctness theorem for the algorithm: 

Theorem 2 (Stone-Harper). 

1 . (Completeness) If E \- c± = C2 '■ K then E \- ci : K ^ E \- C2 '■ K . 

2 . (Soundness) Suppose \- E = E' , E \- K = K' , E d c\ \ K and E' \- C2 '■ K' . 

Then if E \- c\ K ^ E' \- C2 '■ K' then E \- c\ = C2 '■ K . 

Corollary 2 . If E d ci = C2 : K then E h R{ci, K){R{E)} : AT O T h 

R{c 2 ,K){R{E)} : K. 

There is one minor difference between this algorithm and the one presented 
in Stone and Harper. When checking constructor equivalence at a singleton kind. 
Stone and Harper’s algorithm immediately succeeds, while the algorithm here 
behaves the same as when comparing at kind T. However, Stone and Harper’s 
proof goes through in almost exactly the same way, with only a change to one 
subcase of their “Main Lemma.” Their algorithm is more efficient, since it ter- 
minates early in some cases, but for our purposes we are not concerned with 
efficiency. The advantage of this version of the algorithm is that we may obtain 
the stronger version of soundness given in Theorem^ 
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Natural kind extraction 
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Fig. 6. Constructor Equivalence Algorithm (Six-Place Version) 



Definition 3. A derivation is mostly free of singleton elimination if every use 
of singleton elimination (Rule in that derivation lies within a subderivation 
whose root is a constructor formation or suhkinding judgement. 



Theorem 4 (Singleton- free soundness). Suppose \- F = F' , F \- K = K' , 
F \- Cl : K and F' \- C 2 . K' . Then if F \- ci : K ^ F' \- C 2 '■ K' without 



16 



K. Crary 



using singleton reduction then there exists a derivation of F \- c\ = C2 '■ K that 
is mostly free of singleton elimination. 

Proof. By inspection of Stone and Harper’s proof. 

Theorem 0 fails with the more efficient version of the algorithm because 
when Ti h Cl : S'(c'i) I2 h C2 : *S'(c2), the soundness proof must use singleton 
elimination to show that ci and c'l are equal and that C2 and C2 are equal, in the 
course of showing that ci and C2 are equal. 

In the next section we will show that the algorithmic derivation shown to exist 
by Corollary 121 is free of singleton reduction. Then Theorem 0 will permit us to 
conclude that the corresponding derivation in the declarative system is mostly 
free of singleton elimination. A derivation mostly free of singleton elimination 
uses singleton elimination in no significant manner; any residual uses (within 
constructor formation or subkinding) will be removed by singleton erasure in 
Section H.,SL 

4.2 Absence of Singleton Reduction 

The heart of the proof is to show that singleton reduction will not be used in a 
derivation of algorithmic equivalence of expanded constructors. It is here that we 
really show that expansion works to eliminate singleton kinds: if the algorithm 
is able to deduce that the two expanded terms are equal without using singleton 
reduction, then we have obviated the need for singleton kinds. 

The proof works by defining a condition, called protectedness, that is satisfied 
by expanded constructors, that rules out any need for singleton reduction, and 
that is preserved by the algorithm. First we make some preliminary definitions: 

Definition 5. 

— Two kinds K and K' are similar (written K r; K' ) if they are the same mod- 
ulo the contents of singleton kinds. That is, similarity is the least congruence 
such that S{c) ~ > 5 (c') for any constructors c and d . 

— Two assignments F and F' are similar (written F Ri F' ) if they bind the 
same variables in the same order, and if F{a) ~ F'{a) for all a € Dom(T). 

Note that a well-formed kind can be similar to an ill-formed kind, and likewise 
for assignments. When two kinds or two assignments are similar, they are said 
to have the same shape. For the proof of the absence of singleton reductions, 
we will be able to disregard the actual kinds and assignments being used and 
consider only their shapes; this will simplify the proof considerably. This works 
because the contents of singleton kinds are only pertinent to singleton reduction, 
which we are showing never takes place. 

We also define contexts (C) as shown below. Note that contexts are defined 
to have exactly one hole, and note also that elimination contexts are a subclass 
of contexts. As we are not concerned with the contents of singleton kinds, there 
is no need for contexts to account for constructors appearing within the domain 
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kind of a lambda abstraction. Instantiation of a context is defined in the usual 
manner; in particular, it is permissible for instantiation to capture free variables. 

C ::= [] I \a-.K.C \Cc\cC\ {C,c) \ (c,C) \ ttiC \ ttzC 

Finally, we define weak head reduction without an assignmenlQ in the usual 
manner (that is, E[{Xa:K.c)c^] — > E[c{d /a}] and £’[ 7 ri(ci, C2)] — >■ E[a]). Note 
that if Cl — > Cl then E \- ci — > C2 (recall algorithmic weak head reduction) . 

We are now ready to define the protectedness property. The intuition is that 
a constructor is protected if every variable in that constructor appears in an 
elimination context that drives it down to kind T {i.e., that performs elimination 
operations on it resulting in a constructor of kind T). By implication, this means 
that no variable appears in an elimination context driving it down to a singleton 
kind. In other words, no path within the constructor will have a singleton natural 
kind and consequently singleton reduction will not take place. In order to ensure 
that protectedness is preserved by the algorithm, we strengthen the condition so 
that the elimination context that drives a variable to kind T must be appropriate. 
An elimination context is appropriate if, for every application appearing in that 
context, the argument constructor is protected (and, moreover, is still protected 
when driven to kind T and weak head normalized). 

Definition 6. Suppose E is an assignment and K is a kind. The relations E- 
protected, AT-T-appropriate, and AT-T-protected are the least relations such that: 

1 . Protectedness 

— A constructor c is E-protected if whenever c = C[a] (where a € Dom(A) 
and C does not capture a), there exist C and E such that C[] = C'[A[]], 
and E[(x] is T -E -appropriate. 

2 . Appropriateness 

— A path a is K-E -appropriate if E{a) ~ K. 

— A path pc is K2-E -appropriate if p is {II a\Ki.K2)-E -appropriate and c 
is Ki-E -protected. 

— A path TTip is Ki-E -appropriate if p is {Ea:Ki.K2)-E-appropriate. 

— A path tt 2P is K2-E -appropriate if p is {Ea:Ki.K2)-E-appropriate. 

3 . Protectedness relative to a kind 

— A constructor c is T-E -protected if c is E -protected. 

— A constructor c is S{c")-E -protected if c is E-protected. 

— A lambda abstraction Xa:K[.c is {II a: Ki.K 2) -E-protected if c is K2- 
{E, a:Ki) -protected. 

— A pair (ci,C2) is {Ea\Ki.K2)-E -protected if ci is Ki -E-protected and C2 
is K2-E -protected. 

^ As opposed to the algorithm’s judgement E \- ci — > C2 for weak head reduction 
within an assignment E. 
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Note that the relations being defined appear only positively above, so Def- 
inition 0 is a valid inductive definition. Also, note that these definitions are 
concerned with kinds only up to similarity, and for this reason the definition can 
safely ignore the presence of free variables in kinds and assignments. 

We are now ready to prove the main lemma: 

Lemma 3 (Main Lemma). 

1 . If Fi h Cl : Ki l2 h C2 : K2 is derivable, c\ — >* c'l, C2 — >■* c'l is 

Ki-Fi-protected, and c'2 is K2-F2~protected, then the derivation does not use 
singleton reduction. 

2 . If Fi h Pi t Ki O l 2 h p 2 t K2 is derivable, C\ is Ki-Fi-appropriate, and C2 
is K2-F2~appropriate, then the derivation does not use singleton reduction. 

Proof. By induction on the algorithmic derivation, using a substitution lemma 
to establish that protectedness is preserved by the weak head reduction. 

It remains to show that expanded constructors are protected. In the following 
lemma, protectedness is lifted to kinds in the obvious manner. 

Lemma 4. 

1 . If p is K -F- appropriate and K is F-protected then R{p,K) is F-protected. 

2 . If c and K are F-protected then R{c, K) is K -F -protected. 



Corollary 3. If F V- ok then R{c, K){R{F)} is K-F -protected. 



Corollary 4. If F \- ci = C2 : K then there exists a derivation of F \- 
R{ci, K){R{F)} = R(c 2, K){R{F)} : K that is mostly free of singleton elim- 
ination. 



4.3 Wrapping Up 

To complete the first half of the proof, we need only the fact that singleton era- 
sure preserves derivability of judgements with mostly singleton free derivations. 

Lemma 5. 

1 . If F \- C\ = C2 '. K has a derivation mostly free of singleton elimination, then 
F° K/ Ci“ = C2° : K°. 

2 . If Fd c:K then F° b/ c° : K° . 

3 . If Fd Ki< K2 then Ki° = K2° . 

4 -. If F \- ok then F° hs/ ok. 



Corollary 5. If F \- a = 02 : K then F° \-gf {R{ci, K){R{F)})° = 
{R{c 2 ,K){R{F)})° -.K°. 
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For the converse, we already have most of the facts we need at our disposal. 
We require two more lemmas. One states that the algorithm is symmetric and 
transitive. It is here that the use of a six-place algorithm is critical. For the 
six-place algorithm it is easy to show that symmetry and transitivity hold. For a 
four-place algorithm, on the other hand, it is a deep fact depending on soundness 
and completeness that symmetry and transitivity hold for well-formed instances, 
and for ill-formed instances it is not known to hold at all. 

Lemma 6. 

1. // Fi h Cl : iCi l2 h C2 : K2 then F2 h C2 : K2 A 1“ Ci : Ki. 

2. If A h Cl : Ki O A I” C2 : K2 and A 1“ C2 : K2 A 1“ C3 : K3 then 

Fi \- Cl : Ki 4 ^ \- C3 : K3. 

The other lemma states that if singleton reduction is not employed in the 
algorithm, then whatever singleton kinds appear are not relevant and may be 
erased. Moreover, since the two halves of the algorithm operate independently 
(here again the six-place algorithm is critical), we may erase them from either 
half of the algorithm. 

Lemma 7. 

1. If Fi \~ Cl : Ki A b C2 : K2 without using singleton reduction, then 

A b Cl : Ai O A° b C2° : K2° 

2. If A b Pi t -ffi ^ A b p2 t K2 without using singleton reduction, then 
AbpitiblO A“bp2°t^2°. 

It is worth noting that the algorithmic judgement in Lemma 0 is quite pecu- 
liar, in that F is ordinarily not equal to F° and K is ordinarily not equal to K° . 
Although there is a valid derivation of this algorithmic judgement, the sound- 
ness theorem does not apply, so it does not correspond to any derivation in the 
declarative system. When we apply this lemma below we will use transitivity to 
bring the assignments and kinds back into agreement before invoking soundness. 

Lemma 8. // F h ci : AT, F h C2 : A and F° A/ {R{ci, K){R{F)})° = 
(F(c2, K){R{F)})° : K° then F h ci = C2 : AT. 

Proof. By Lemma |3 F h ci = R{ci, K){R{F)} : K. By algorithmic complete- 
ness, F h Cl : AT F h R{ci, K){R{F)} : K. By symmetry and transitiv- 
ity of the algorithm, F h R{ci,K){R{F)} : AT F h R{ci,K){R{F)} : K. 
Then, by Corollary 01 and Lemmas 01 and 0 F h R{ci,K){R{F)} : K ^ 
F° h (i?(ci, A:){F(F)})° : K°. By transitivity, F h ci : AT F° h 

{R{ci,K){R{F)})° : K°. Similarly, F h C2 : AT F° h {R{c2, K){R{F)})° : K°. 

Since the singleton-free system is a subsystem of the full system, we have 
by algorithmic completeness that F° h (F(ci, A'){F(F)})° : K° F° h 

{R{c2, K){R{F)})° : K°. Hence, by symmetry and transitivity, F h ci : AT 
F h C2 : AT. (Note that by applying transitivity, we have swept away the pecu- 
liarity noted above.) Therefore F h ci = C2 : AT by algorithmic soundness. 



This completes the proof. 
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5 Related Work and Conclusions 

The primary purpose of this work is to allow the reification of type equality infor- 
mation in a type-preserving compiler for a language like Standard ML, thereby 
eliminating the need to complicate the metatheory of the latter phases of the 
compiler with singleton kinds. Within this architecture, equality (or “sharing”) 
information would initially be expressed using singleton kinds, but at some point 
singleton kind elimination would be exploited to eliminate them. Thereafter, with 
singleton kinds no longer available, type information would be propagated by 
substitution, as in Harper et al. 0. 

Shao m proposes a different approach for dealing with type equality in 
module languages. Shao’s approach resembles the approach in this paper, in that 
it substitutes definitions for variables. However, it does so less thoroughly than 
the approach here, since, in keeping with the module-based accounts, less type 
information is to be propagated than in the singleton account, as mentioned 
in Section 12 . 1 1 In effect, Shao’s substitution does not account for the issue of 
internal bindings discussed here in Section I.S. 1 1 

Another alternative is given in an earlier paper by Shao [ni. In his earlier ap- 
proach, equality specifications are taken as mere abbreviations and deleted from 
signatures. The main work arises in ensuring that the appropriate subsignature 
relationships hold: a signature containing a type abbreviation must be considered 
a subsignature of a similar one that contains that type but not the abbrevia- 
tion (as required by Standard ML and the standard type-theoretic accounts). 
To accomplish this, when a structure matching a signature with a deleted field 
is used in a context where that deleted field is required, the translation coerces 
the structure to reinsert the deleted field. Thus, Shao’s earlier approach differs 
from the one here in two main ways: it interprets the subsignature relation by 
coercion, whereas this paper’s approach interprets it by inclusion; and (as with 
the later approach) it does not account for indirect equalities resulting from in- 
ternal bindings — abbreviation occurs only where equality specifications appear 
syntactically. 

Aspinall PP studies in detail a related type system with singleton types. The 
difference between singleton kinds and his singleton types is entirely cosmetic 
(this work could just as easily be presented as singleton type elimination), but 
various other technical differences between his system and this one make it un- 
clear whether the same elimination process would apply to his system as well. 
Stone and Harper m compare this system to Aspinall’s in greater detail. 

An implementation of this paper’s singleton kind elimination procedure in 
the context of the TILT compiler is planned, but has not yet been done. The 
main challenge we anticipate in this implementation, is that singleton kinds, 
in addition to expressing type equality information from the module language, 
are also very useful for expressing type information compactly. The elimination 
of singleton kinds could thus substantially increase the space taken up by type 
information. (In the limit, a particularly naive implementation could result in 
exponential blowup of type information by breaking DAGs into trees.) This issue 
could arise in two ways; first, type information could take up more space in the 
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compiler, resulting in slower compilation, and, second, if types are constructed 
and passed at run time 0, inefficient type representation could result in poor 
performance at run time. Shao et al. nn discuss a number of ways to deal 
with the former issue, such as hash-consing and using explicit substitutions. The 
latter issue can be addressed by making the construction and passing of type 
information explicit and doing so before performing singleton elimination; 
then singleton elimination will have no effect on the run-time version of type 
information. 
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A Inference Rules 



Well-Formed Context 



rhok 



£ h ok 



( 1 ) 



r\- K a(f Dom(r) 
P, a:K h ok 



(2) 



Context Equivalence 



h A = U2 



I- e = e 



(3) 



h El = E 2 a h Ki = K 2 a(f Dom(A) 



h Fi,a:Ki = F 2 , a:K2 



(4) 
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Well-Formed Kind 



r'r K 



Eh ok 

ri-T 



(5) 



Subkinding 



Kind Equivalence 



F h c : F 


(6) 


F h S(c) 


F, a:K' h K" 


(7) 


r h na:K'.K" 


r, a:K' h K" 


(8) 


Fh Ea:K'.K" 




r\- K <K' 


F h c : F 


(9) 


F h S{c) < T 


Fh ok 


(10) 


F h F < F 


F h Cl = C2 : F 


(11) 


Fh S(ci) < S'(C 2 ) 


F h na:K'i.K” 





r^K’^<K[ r,a:K' 2 h Kj' 

r h na:K[.K” < 

rh Ea:K'2.K2 

r^K[<K'2 r, a-.K[ h K'{ < m 
r h Ea:K[.K[' < Ea:K^.K'^ 

rh Ki = K2 



rh ok 
r h T = T 

F h Cl = C 2 : T 

rhs(ci) = s'(c 2 ) 

r\- K2 = K[ r, a:K[ h K” = K'2 
r na:K{.K” = na-.K2.K2 

r\- K[ = K2 r, a:K[ h K” = K'2 
r Ea:K'i.K'{ = Ea-.K2.K2 

Well-Formed Constructor 



( 14 ) 

( 15 ) 

( 16 ) 
( 17 ) 



F h c : K 



Fh ok 



F h 6 : F 



( 18 ) 
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rh ok 

r h a : r(a) 

r, a:K' 'r C-.K” 
n- Xa:K'.c: Ha-.K' .K” 

The: na:K'.K” F ^ c' : K' 
r h cc' : K”{c'/a} 

The: Sa-.K'.K” 
r h TTic : K' 

The: Ea:K'.K” 
r h 7 T 2 C : if”{ 7 ric/a} 

n- Ea:K'.K" 
r\-ci:K' 
r h C2 : if"{ci/a} 
r h (ci,C2) : Ea:K'.K" 
r h c : T 
The: ^(c) 

n- Ea-.K'.K" 
r h TTic : iC' 

-T h 7 T 2 C : if”{ 7 ric/a} 

The: Ea:K'.K” 

The; na-.K'.K[' 
r, a:K' h ca : 

The: na-.K'.K” 

r h c : 1 r h < 7^2 

n- c : A'2 

Constructor Equivalence 

r, a-.K' h Cl = C2 : -ft"' Eh c'l = c '2 : K' 
r h (Aa:-ft".ci)c'i = C2{c^a} ; -ft"'{c'i/a} 

r h Cl : na:K'.K” 
r h C 2 : na:K'.K2 
r,a:K' h cia = C20 : A"' 
rhci =C2 -.na-.K'.K” 

n- Ea:K'.K" 

r h TTlCl = 7 TiC 2 : -ft"' 

E h 7T2 Ci = 7T2C2 : -ft'"{7TlCl/a} 

r h Cl = C2 : Sa:K'.K" 

r h Cl = c'l : A'l n- C2 : -ft'2 



(19) 

(20) 
(21) 
(22) 

(23) 

(24) 

(25) 

(26) 

(27) 

(28) 
c = c' :K 

(29) 

(30) 

(31) 



r I- 7Ti(ci,C2) = c'l : Ki 



(32) 
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r h Cl : r h C2 = : K2 

r h 7 T 2 (ci, C2) = c'2 : K2 
The: S{c') 



r\-c = c' 


: T 


r\-c = c' 


: T 


r\-c = c' : 


S{c) 


r\-c' = c 


-. K 



r\-c = c' : K 

r\- c = c' : K r\- c' = c” : K 
r h c = c" : K 

rh ok 

r\-b^b:T 
n- ok 

_r h a = a : -T(a) 

r h KJ = K 2 r, a:K:; h Ci = C2 : K" 

r \- Xa:K[.ci = Aa:K^.C2 : na:K'.K" 
F'r c = c' : na-.K1.K2 ri-ci=c'i:Ki 
E h cci = c'c'i : K2{ci/a} 
r h Cl = C2 : Ea-.K'.K” 
r h TTlCl = 7 TiC2 : K' 

r I- Cl = C2 : Ea-.K'.K” 
r h 7 T 2 Ci = 7 T 2 C 2 : -K"{ 7 TlCl/a} 

n- Ea-.K'.K” 
r h ci = c^ : K' 
r h ci' = c'2' : K"{ci/a} 
r h (ci,ci') = (ci,ci') : Ea-.K'.K” 
r h Cl = C 2 : K K <K' 

r h Cl = C2 : K' 
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Abstract. The CIL compiler for core Standard ML compiles whole pro- 
grams using a novel typed intermediate language (TIL) with intersec- 
tion and union types and flow labels on both terms and types. The CIL 
term representation duplicates portions of the program where intersec- 
tion types are introduced and union types are eliminated. This dupli- 
cation makes it easier to represent type information and to introduce 
customized data representations. However, duplication incurs compile- 
time space costs that are potentially much greater than are incurred in 
TILs employing type-level abstraction or quantihcation. In this paper, 
we present empirical data on the compile-time space costs of using CIL 
as an intermediate language. The data shows that these costs can be 
made tractable by using sufficiently fine-grained flow analyses together 
with standard hash-consing techniques. The data also suggests that non- 
duplicating formulations of intersection (and union) types would not 
achieve significantly better space complexity. 



1 Introduction 



1.1 The Compile-Time Space Costs of Typed Intermediate 
Languages 



Recent research has demonstrated the benefits of compiling with an explicitly 
typed intermediate language (TIL) IMor95ll* *,l96frMC~^96IPJMH7l,IS98IHKH,981 
rmOAlFKR+99lC.IW()()IMWCC99lWhMToX) . One benefit is that explicit types 



can be used in compiler passes to guide program transformations and select 
efficient data representations. Another advantage of using a TIL is that the 
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compiler can invoke its type checker after every transformation, greatly reducing 
the possibility of introducing errors. If strongly typed intermediate languages 
are used all the way through the compiler to the assembly level (something 
we do not yet do), the resulting object code is certifiably type safe IFCT7I 
LVlWCXfjl??] . Furthermore, types that survive through the back end can be used 
to support run-time operations such as garbage collection and run-time 

type dispatch ILVLor95l . 

The benefits of using a TIL are not achieved without costs. These costs 
include the space needed to represent the types at compile-time, the time to ma- 
nipulate the types at compile-time, and the added complications of transforming 
types along with terms. This report focuses on the compile-time space cost. 

Using a naive type representation can incur huge space costs, even if types 
are only used in the compiler front end for initial type checking. In the worst 
case, the tree representation of types in Standard ML (SML) programs can have 
size doubly exponential in the program size, and the DAG representation can 
be exponential in the program size [MitflBj . Although we are mainly concerned 
with ordinary programs where the worst case space complexity is not encoun- 
tered, these ordinary programs often have types with impractically large tree 
representations but acceptable DAG representations. So in practice, DAG re- 
presentations of types and other techniques are necessary to engineer types of 
tractable size. For example, the SML/NJ compiler’s FLINT intermediate langu- 
age uses hash-consing, memoization, explicit substitutions, and de Bruijn indices 
to achieve space-efficient implementation of types fSLM98| . The TIL compiler 
achieves type sharing by binding all types to type variables, and then perfor- 
ming dead code elimination, hoisting and common subexpression elimination 
on the types ^’arDtil pp. 217-219]. The compiler must then preserve type bin- 
dings across transformations, or else repeat the type-sharing transformations. 
Tarditi reports that the representation size increase imposed by using types in 
TIL averages 5.15 times without this sharing scheme, but only 1.93 times with 
sharing. 

We have constructed a whole-program compiler for core SML based on a 
typed intermediate language we call GIL0. Unlike FLINT and TIL, GIL has 
three features that make compile-time space issues potentially more challenging 
to address than in other typed intermediate languages: 



1. Listing-based types: The GIL type system can encode polyvariant flow 
analyses using polyvariant flow types where labels on type constructors pro- 
vide flow information and intersection and union types provide polyvariant 
analysis. Intersection and union types can be viewed as finitary (listing- 
based) versions of infinitary (schema-based) universal and existential types. 

^ “OIL” is an acronym for “Church Intermediate Language.” The authors are members 
of of the Church Project (http : //types . bu . edu) , which is investigating applications 
of sophisticated type systems in the efficient and reliable implementation of higher- 
order typed programming languages. 
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For example, CIL uses the intersection type 

Tid = A{pi : int — > int, p 2 '■ real — > real} 

to represent an occurrence of the universal type Va.a — >■ a that is instantia- 
ted only at types int and real. The intersection type Tid is similar in structure 
to the CIL product (record) type 

Tfuns = X {pi : int — > int, p2 '■ real — ^ real}. 

The difference is that a value of type Tfuns is a pair of two possibly distinct 
functions having the respective component types while a value of type Tjd is 
a single function having both component types. CIL union types (introduced 
via V) are the dual of intersection types; they are listing-based versions of 
existential types that are similar in structure to CIL sum (variant) types 
(introduced via -I-). 

Encoding polyvariant analyses, which analyze a function multiple times re- 
lative to different contexts of use, can introduce components of intersection 
and union types that differ only by flow information. For instance, when 
encoding poly variance, an innocuous type like int — >■ int might expand into 

V{gi : int int, Q 2 : A{ri : int int, T 2 : int int}}. 

In the function type notation cr t, the annotation ^ is a flow bundle in 
which (j) (resp. ip) conservatively approximates the sites in a program that 
can be sources, or introduction points (resp. sinks, or elimination points) 
for the function values having this type. In this paper, we only show flow 
bundles annotating function types, but CIL supports such annotations on 
almost all types. 

Intersection and union types have several advantages over universal and exi- 
stential types as a means of expressing polymorphism IWUM'LOXI : (1) by 
making usage contexts apparent, they support flow-based customizations in 
a type-safe way; (2) finitary polymorphism can type some terms not typa- 
ble using infinitary polymorphism, thus potentially allowing some program 
transformations to be typable which would not be allowable in a TIL ba- 
sed on infinitary polymorphism; and (3) the listing-based nature of finitary 
polymorphic types can avoid some complications of bound variables in re- 
presenting and manipulating quantified types (see Sec. f2.2l) . There is a space 
cost for these benefits: the listing-based nature of finitary polymorphic ty- 
pes, in combination with flow annotations encoding finer grained types, can 
lead to CIL types that are much larger than those expressed via infinitary 
polymorphic types. 

Assuming whole-program compilation, the finitary polymorphism afforded 
by flow types is sufficient to compile SML programs. In this respect, the 
CIL SML compiler is similar to monomorphizing whole-program compi- 
lers (HMiMEnHEnm]- 
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2. Duplicating term representations: CIL represents the introduction of 
intersection types by a virtual record — a term that explicitly lists multi- 
ple copies of the same component term that differ only in their flow type 
annotations. For example, here is a CIL term that has the type Tid defined 
above: 

A(pi = Aa;“‘.a;, p 2 = 

Virtual record components are extracted via virtual projections. Similarly, 
values of union type (virtual variants) are introduced via virtual injections 
and are eliminated by a virtual case expressions — terms whose branches 
explicitly list multiple type-annotated versions of the same untyped branch. 
Virtual terms that persist until code generation are eliminated at that time. 
Code is generated for only one component of a virtual record and for one 
branch of a virtual case expression, and virtual projections and injections 
disappear entirely. Thus, these virtual term constructs have a compile-time 
space cost but no run-time space (or time) cost. 

Because it makes copies of terms that differ only in type annotations, we call 
CIL a duplicating representation. An advantage of the duplicating approach 
is that type information for guiding customization decisions is locally ac- 
cessible in each copy of a duplicated term. An obvious disadvantage of this 
representation is the duplicated term structure, which is potentially much 
larger than the more compact introduction and elimination forms used for 
universal and existential types. Duplication arises in the CIL compiler whe- 
never intersection or union types are used. The Type/Flow Inference and 
Flow Separation compiler stages discussed in Sec. 12.31 both introduce addi- 
tional uses of intersection and union types. 

3. Closure types exposing free variable types: CIL does not have universal 
or existential types because they hide important information about contexts 
of use and encourage uniform data representations rather than customized 
ones !WDivrro\j . However, existential types are particularly useful for ab- 
stracting over differences in free variables that are exposed in typed closure 
representations for functions of the same source type |MMH96IMWCC9^ 
ICWMflSj . In the CIL compiler, these differences are reconciled by injecting 
the types of closures into a union type and performing a virtual case dispatch 
at the application site |DMTW97j . In a type-erasure semantics, these injec- 
tions do not give rise to any run-time code. However, they can potentially 
cause a blowup in compile-time space when many functions with different 
free variables flow together. 

Our approach to closure conversion is similar to that used by TIL-based 
compilers that remove higher-order functions via defunctionalization unsi 
KFTWOOj . As in the CIL compiler, these compilers use flow analysis to cu- 
stomize the closure representation for particular application sites. However, 
these flow analyses are not integrated into the type system. These defunc- 
tionalizing compilers maintain type correctness during closure conversion by 
injecting closures with different free variables that flow to the same applica- 
tion site into a sum type, and performing a case dispatch on the constructed 
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value at the application site. The difference here is that in CIL this can be 
done with a mix of virtual and real sum types while in the defunctionali- 
zing compilers all of the sum types must be real and hence require run-time 
analysis. Some defunctionalizing compilers avoid this run-time cost by using 
the appropriate code pointer as a “tag” in the generated object code and 
replacing the case dispatch by a jump, but their type systems do not sup- 
port this as a well typed operation and hence this must be done in the code 
generator after types are dropped. In contrast, in CIL the combination of a 
virtual sum (i.e., union) type with real closure types makes this approach 
well typed. 



1.2 Contributions 

Taken together, listing-based types, duplicating term representations, and clo- 
sure types that expose free variable types raise the specter of compile-time space 
explosion at both the term and the type level. However, preliminary experiments 
with a small benchmark suite indicate that standard hash-consing techniques are 
able to keep the size of CIL types and terms tractable. 

The main contributions of this paper are the following two observations: 

1. Duplicating term representations are practical: Our experiments show 
that, for the ffow analyses that we have investigated, the space required 
for CIL terms in our benchmarks is always within a factor of 2.1 of (and 
usually significantly closer to) our estimate of a minimal size for a non- 
duplicating TIL. This result is surprising, since we and many others expected 
the duplicating term representation to have a significantly higher space cost. 
Before we obtained these results, we expected that it would be essential 
to develop a non- duplicating term representation in which a single term 
schema somehow contains multiple flow type annotations. For example, using 
the notation of Einil, Tid could be expressed as something like: for a G 
{int, real}.Aa:“.a:. Although this notation is more compact, it makes type 
information less accessible and can be tricky to adapt to more complex si- 
tuations IWUM'IOXI . We have made preliminary investigations into other 
representations, e.g., one based on the skeletons and substitutions of EMI- 
Based on the empirical results presented here, we believe that developing a 
non-duplicating representation of CIL may be not critical (though it may 
still be worthwhile). However, only one of the flow analyses we have ex- 
perimented with to date expresses a non-trivial form of polyvariance, so it 
remains to be seen whether these results hold up in the presence of more 
polyvariant flow analyses. 

2. Finer-grained flow analyses yield smaller types and terms: 

Our experiments indicate that, for some classes of flow analyses, increasing 
the precision of flow analysis can significantly reduce the size of program 
representations in CIL. Benchmarks require the most compile-time space for 
the least precise type-respecting flow analysis (one that assumes that any 
function with a given monomorphic type can flow to any call site applying 
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a function with this type). This imprecision leads to union types for clo- 
sures that are much larger than necessary. More precise flow analyses can 
substantially reduce the size of these closure types. 

Flow analysis has similarly been used to reduce the size of closure types in 
monomorphizing and defunctionalizing TIL compilers |T()98IC.TW00l . Ho- 
wever, previous work has neither quantified the benefits of using flow analysis 
in this context nor studied the effects of different flow analyses on compile- 
time space. We believe that we are the first to present a detailed empirical 
study of the effects of a variety of flow analyses on program representation 
size. 

1.3 Representation Pollution 

In addition to our results about the tractability of compile-time space in the CIL 
compiler, we have preliminary evidence that the compiler may be able to achieve 
one of its main design goals: avoiding representation pollution when choosing 
customized data representations. Representation pollution occurs when a source 
form is constrained to have an inefficient representation because it shares a sink 
with other source forms using the inefficient represention. A complementary 
phenomenon occurs with pollution of sink representations. 

As an example of representation pollution, as well as some other issues that 
arise in a compiler based on CIL, we will consider the compilation of the un- 
typed CIL source term in Fig. CQ The term contains two abstractions, two 
applications (denoted by the @ symbol), and a tuple introduction form (intro- 
duced via xl^. The abstraction {Xx.x * 2) flows to both application sites while 
the abstraction {Xy.y + a) flows only to the rightmost application site. 



let / = (Xx.x * 2) 
in let g = (Xy.y + a) 

in x(f @ 5, (if b then / else g) @ 7) 



Fig. 1. An untyped CIL term 



The diagram in Fig. Ogives an abstract depiction of a CIL compiler inter- 
mediate representation of the code in Fig. O that might emerge from the Type 

^ We introduce and explain elements of the CIL language on an “as needed” basis 
in the context of our examples; readers interested in the details of the language 
andrts_tTOe system should consult the appendix of the companion technical report 
IDWM+01| . 

® In CIL, as in ML, a tuple is a record with implicit positional labels. In general, the 
term notation P(Mi, . . . ,M„) is a shorthand for P(fi = Mi, . . . , f „ = M„), where 
P ranges over x and A, and fi, f 2 , . . . , is some fixed infinite sequence of distinct field 
names. Similarly, the type notation Q[ti, . . . , r„] is shorthand for Q{fi : ri, . . . , f„ : 
Tn}, where Q ranges over x, -b. A, and V. 
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Inference/Flow Analysis (TI/FA) stage of the compiler. The TI/FA stage (de- 
scribed in more detail in Sec. computes an approximation of the flow of 
values between sources and sinks in the input term and represents the analysis 
in the output typing. In this case, the CIL representation of the source term 
* 2) has been split into the virtual tuple 

A(A{3}a;“*.a; * 2, * 2^ , 

which contains one copy of the function for each of the application sites to which 
it flows. The notation A;^ denotes an abstraction labelled ^ that may flow to the 
sinks whose labels are in the set if), while denotes a sink labelled k to which 
abstractions whose labels are in the set (j> may flow. Free variables and A-bound 
variables are superscripted with their type. Terms of the form □) are virtual 
tuple projections that select the ith component of a virtual tuple. 

The typing rules of CIL (not detailed here) guarantee that the flow annot- 
ations appearing in CIL types are sound. That is, an abstraction may only be 
applied at sites listed in its sink set, and only the abstractions appearing in the 
source set of an application site may be applied at that site. In Fig. the type 
of the first component of the virtual tuple (int int) is the type required for 

the function position of the application site @3^^ to which the function flows. 
The type on the second component of the virtual tuple (ji = int int) does 

not match the type (rs = int int) required at its application site @3 ’ , so 
this component value must be coerced to the correct type somewhere along the 
flow path to the application site. A subtype coercion from a term M of type a to 
a supertype r of ct is witnessed by an explicit term of the form coerce (a, r) M. 




Fig. 2. A possible result of Type Inference/Flow Analysis 



The typing rules also require that the type erasures of all the components of a 
virtual record and all the branches of a virtual case expression must be the same. 
The type erasure of a term is the untyped terms that result from eliminating 
all types, labels, and virtual forms (virtual records, virtual projections, virtual 
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injections, virtual case expressions, and coercions) from the term. This type 
erasure constraint guarantees that virtual record components and virtual case 
expression branches are just different typings of the same untyped term and can 
therefore share the same run-time representation if the virtual forms survive to 
the code generation phase. If the compiler elects to customize the representations 
of the components of a virtual record, the virtual record will be reified into a real 
record (by changing A to x in terms and types) that is explicitly represented 
in the run-time code. Similarly, by changing V to -I-, the compiler can reify a 
virtual case expression to be a real case expression that performs a dispatch on a 
real variant at run-time. The compiler is designed so that reifying virtual forms 
in this manner is type-safe. 

As representation decisions are made during subsequent stages of compila- 
tion, further duplication may occur. Fig. Oldepicts a possible output of the Flow 
Separation stage. This stage (described in more detail in Sec. l2.3|l introduces new 
virtual forms to guarantee that the output of the later Representation Trans- 
formation stage will be well-typed. In Fig. 0, the Flow Separation stage has 
split the application site into two applications sites @ 4 ^^ and @ 4 ^^. These 

applications occur within a virtual case expression, which has the form 

case'^ Mdisc bind a; in ti => Mi . . .Tn ^ Mn- 

A virtual case expression dispatches to the branch => Mk based on the posi- 
tional tag k of the of the discriminant which must have type V[ti, . . . r„]. 

Within the chosen branch, the variable x of type is bound to the value of Mdisc- 

In Fig. 0 the functions formerly flowing to the single application site @ 4 ^’^^ are 
now injected into virtual variants (values of union type r) via (if □)’’, where 
i in {1,2} is the positional tag of the variant. These virtual variants both flow 
to the discriminant position of the virtual case expression, which chooses one 
of the two type-annotated versions of the application h @ 7. Splitting /i @ 7 in 
this manner gives the compiler the option to use different representations for the 
closed abstraction AJ 4 J and the open abstraction A^ 4 j. 

As with source splitting, this kind of sink duplication increases the size of 
the compile-time representation of the program, but the object code size and 
run-time space costs increase only if some of the virtual variants and virtual 
case expressions are reified in a subsequent compilation stage. Observe that the 
sink duplication introduced by Flow Separation in this example has eliminated 
the need for both of the coercions present in Fig. 0and will usually reduce the 
sizes of fiow sets. In general, there are many trade-offs between the amount of 
virtual duplication and subtype coercion. The trade-offs are very sensitive to the 
granularity of the fiow analysis and to the representation customization strategy. 

We have developed several strategies for reducing (and in some cases comple- 
tely eliminating) representation pollution in the case of function representations 
(see Sec. ^0). More work is necessary to evaluate the run-time aspects of the 
customization capabilities of the CIL SML compiler. In a future report we will 
present a detailed study of the run-time consequences of compiling with polyva- 
riant fiow types. 
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Fig. 3. A possible result of Flow Separation 



1.4 Outline 

The remainder of this paper is organized as follows. Sec. |2I provides an overview 
of the CIL compiler for SML. Sec. 0 presents space-related measurements for 
several standard benchmark programs at various phases of compilation. Sec. 0 
summarizes our conclusions and describes future work. 

2 An Overview of the CIL Compiler 

2.1 The Intermediate Language 

To implement the features of core SML, CIL extends the purely functional 
calculus [WDMTOXj with primitive datatypes, references, arrays, and excepti- 
ons. For details of the the s yntax and typing rules of CIL, see the companion 
technical report pWM+oT] . Although CIL is based on the A^^^'-calculus, CIL 
itself is not a calculus. We have implemented a semantics for CIL, but we have 
not written its formal counterpart. While we have proven formal properties like 
standardization, subject reduction, and type soundness for the A^^'^-calculus, we 
have not yet established any of these properties for CIL. 



2.2 Type and Term Representations 

To keep the sizes of types tractable, the CIL compiler uses hash-consing to re- 
present types as compact directed acyclic graphs instead of as trees. This is 
similar to the type representation in the SML/NJ compiler’s implementation of 
its FLINT intermediate language |SLM98j . One important issue faced in FLINT 
is not an issue for CIL. FLINT types have higher-order features such as abstrac- 
tions and applications, i.e., a A-calculus inside the types. Because FLINT types 
are identified modulo /3-conversion, and because eager /3-normalization of types 
can lose sharing and do excess work, the hash-consing scheme for FLINT types 
uses explicit substitutions mm and memoization of substitution propagation 
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steps. Unlike FLINT, the CIL types do not have such higher-order features, so 
the CIL hash-consing of types is simpler. 

Sets of flow labels are often used by many types and/or terms. A single 
copy of each set is shared by all uses. Using the duplicating representation for 
terms, two CIL term occurrences are rarely structurally equivalent, so we do 
not use hash-consing for terms. However, the types and flow sets annotating 
terms are hash-consed, as described above. Strings, used for record field names 
and constructor names, are also shared by all uses and lists of strings are hash- 
consed. 

2.3 Compiler Architecture 

The architecture of the CIL compiler I D ivn ' W !T7] is summarized in Fig. 0 This 
section briefly describes the compilation stages depicted in the flgure. 




Fig. 4. Compiler Architecture 
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Defunctorizing, Parsing, Elaboration. Our compiler implementation takes 
advantage of existing tools and other freely available SML compilers. The OIL 
compiler uses the MLton source-to-source defunctorizer jC.lWOO] as a prepass 
to convert SML into Core SML. It then uses the front end of the SML/NJ 
110.03 compiler (somewhat modified) to produce FLINT code. The FLINT code 
is translated to untyped CIL code, keeping datatype information on the side to 
avoid reinference of recursive types. 

Type Inference/Flow Analysis(TI/FA). This stage accepts an untyped CIL 
term (plus some of the FLINT type information) as input and returns a typed 
CIL term as output. The typed term encodes a flow analysis that is a conservative 
approximation of the run-time flow. The TI/FA module is parameterized over a 
choice of flow analysis. We currently support five different flow analyses, which 
vary with respect to the precision of the approximation. In this paper, we present 
data from two of these: 

1. The typed source split analysis is a variant of Banerjee’s ITCT7I modified for 
shallow subtyping [WDMTO^ ; the use of shallow subtyping makes it slightly 
less precise than the combination of monomorphization and OCFA analysis. It 
introduces virtual tuples and virtual projections but neither virtual variants 
nor virtual case forms. 

2. The min type respecting analysis is the least precise flow analysis that is still 
type-correct (cf. j.TWW97] L It conflates the flow information on all values 
of the same flow erased type. For example, an abstraction of type int — int 
will be assumed to flow to every application site whose rator has this type. 
This analysis models a monomorphizing compiler in which types carry no 
useful flow information. 

We have also implemented a finer analysis that splits some let and letrec 
definitions based on variable occurrences. Both typed source split and this limited 
let split analysis may be implemented either with shallow subtyping constraints, 
or with equality constraints. Unless specifically stated, we will use these terms 
to refer to the analysis with shallow subtyping constraints. 

The granularity of the flow analysis can greatly affect program size. A coarser 
grained flow analysis will generally show more functions flowing to a given call 
site than will a finer analysis. This can lead to larger union types and more 
branches in virtual case expressions. 

The precision of flow analysis also affects which variables are considered to 
be free, and thus affects the size of environments. The CIL compiler currently 
implements a known function optimization in which an invocation of a function 
whose identity is known at compile time (as determined by flow analysis) com- 
piles to a direct jump. The name of such a known function is not considered to 
be a free variable. A coarser grained analysis will find that fewer functions are 
known, leading to larger environment typesO 

The numbers presented in this paper were taken before the known function opti- 
mization was implemented. This optimization further widens the space gap between 
coarse grained analyses like min type respecting and finer grained ones like typed 
source split. 
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Representation Choices (RC). This module selects representations for a 
function that are adequate for each of the application sites to which it flows. 
Seven different function representation choice strategies have been implemented. 
The uniform strategy represents all functions with closure records having the 
type 



x{code : {arg : Targ, env : Tenv} -t Tbody,env : Tenv}, 

where the code field contains a closed function and the env field contains a record 
of the values of the free variables of the function. A closure data structure is 
applied to an argument by projecting both fields from the closure record and 
applying the function from the code field to an argument record consisting of (the 
closure conversion of) the actual argument packaged together with the projected 
environment. 

The other three representation strategies generate specialized representations 
based on various conditions detected in the term structure. Wand and Steckler 
coined the term “selective” representation to refer to representations of 
functions that do not include an environment component. A selective represen- 
tation is adequate for a closed function if the function flows only to call sites 
with compatible application protocols. In jWS94j . selective representations were 
disabled in the presence of representation pollution — i.e., when a closed func- 
tion shared a call site with some number of open functions. In contrast, the CIL 
compiler can still use selective representations in such situations removing the 
pollution via a splitting strategy. 

The selective sink splitting strategy implemented in the CIL compiler gene- 
rates a selective representation when the function has no free variables. This 
representation is called “sink splitting” because if the function shares call sites 
with open functions, the transformation framework will inject the function re- 
presentations into a sum type and the application site will be split into multiple 
sites governed by a case dispatch. The transformation of the program depicted 
in Fig. 0 to the one depicted in Fig. Elis a sample application of the selective sink 
splitting strategy. It is also possible that selective sink splitting will cause vir- 
tual records created by TI/FA to be reified into normal records if, e.g. a selective 
representation is chosen for a call site in one element of the virtual record, and 
a closure representation is chosen for the corresponding call site in a different 
element of the virtual record. 

The selective source splitting strategy generates a selective representation for 
a closed function flowing to call sites that are not shared with open functions. 
Under this strategy, if a closed function shares some application sites with other 
closed functions but shares other application sites with open functions, then 
the framework will “split the source” by generating a record containing several 
copies of the function. The appropriate representations are projected from the 
record somewhere along the flow path to the respective call sites. 

Other strategies implemented in the CIL compiler include an inlining stra- 
tegy, defunctionalization, and a strategy which disables selective representations 
in the presence of representation pollution. 
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The selective sink splitting generates more duplication than the the other 
strategies for selective closure representation, and is thus of more interest in this 
paper. 



Flow Separation (FS). This stage accepts as input a typed program and a 
flow-path partitioning function (Lfpg) supplied by RC. It specifies which flow 
paths can coexist in the same flow bundles. For flow paths that cannot coexist 
in the same bundle, the FS phase will introduce whatever coercions and virtual 
forms (i.e., virtual variant injections, virtual case expressions, virtual tuples, 
or virtual tuple projections) are required to ensure that the result of the later 
Representation Transformation stage will be well-typed. 



Split Reification (SR). This stage accepts as input a typed term and a flow- 
path-partitioning function (Ff^) supplied by RC. This phase reifies whatever 
virtual forms are required to remove representation pollution. We refer to the 
reiflcation process as splitting because it causes the code generator to generate 
multiple copies of a term in situations where only one copy would have been 
generated without reiflcation. In general, the current simple algorithm may split 
more than is necessary IDM'TW 971 . Specifying and implementing a more efficient 
splitting algorithm remains for future work. 



Representation Transformation (RT). This stage accepts as input a typed 
term and a representation map (TZ) provided by RC. It walks the term and 
installs the function representations specified by the map. The FS stage only in- 
troduces virtual forms, and the SR stage only reifies virtual forms. The RT stage 
performs the actual work of changing the code for specialized representations. 
For instance, in the case of selective closure conversion, it is RT which changes 
some functions to closures, and some call sites to calls to closures. 

An interesting aspect of the transformation is that the result of the transfor- 
mation may have a recursive type even though the source of the transformation 
has no recursion in either terms or types: recursion through flow labels in the 
source term may be enough to cause the transformed term to have a recursive 
type. 



Code Generation. The CIL compiler back end transforms typed CIL pro- 
grams into assembly code for the SPARC processor. It does not currently add 
any type annotations, or assertions, to the assembly code, although this is plan- 
ned for future work. The produced assembly code is linked with a runtime library 
providing the environment in which CIL programs are executed. The back end 
is based on MLRISC, a framework for building portable optimizing code ge- 
nerators icEnnzi. CIL programs are translated into the MLRISC intermediate 
language, and the framework is specialized with CIL conventions for each tar- 
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get architecture 0MLRISC handles language-independent issues such as register 
allocation and code emission. 

The runtime library is written in C and provides memory management, ex- 
ception handling, basis functions and a foreign function interface for CIL pro- 
grams at runtime. The runtime library currently manages memory using the 
Boehm-Demers-Weiser conservative garbage collector for C [RoeflSj. CIL pro- 
grams use stack-allocated activation records, which have a layout similar to C 
stack frames. Basis functions are called through the foreign function interface, 
which provides data and activation record conversions between CIL and foreign 
languages. The code generator does not yet optimize tail recursion. 

CIL data representations are straightforward. Records, arrays, references, 
and strings are heap-allocated and include size header^ Exception identifiers 
and all other constants are immediate. Injections may either be immediate or 
heap allocated, depending on the number and type of summands in their type. 

Recursive bindings are restricted to CIL values - terms that cannot diverge, 
affect the store, or raise exceptions. The CIL notion of value is more liberal than 
that of SML; in particular, CIL allows recursive bindings that specify cyclic 
data structures, whereas SML does not. Although input programs must adhere 
to SML restrictions on recursive definitions (because we use the SML/NJ elabo- 
rator), compiler transformations may (and do) create recursive specifications of 
cyclic data structures. The CIL value restriction allows the code generator to use 
a two phase algorithm for recursive bindings: the first phase allocates memory 
for the values, while the second phase fills them in. 



3 Representation Measurements 

The main purpose of this paper is to determine whether CIL has acceptable 
compile-time space costs and to evaluate how flow analysis and representation 
strategy combinations affect these costs. This section presents data indicating 
that CIL is tractable as a compiler intermediate language when used with a 
reasonably fine-grained flow analysis. 



3.1 Space Profiles 

We have tested the CIL SML compiler for most combinations of flow analyses 
and function representation strategies on 22 kernels and small benchmarks taken 
from the O’Caml, TIL and SML/NJ benchmark suites. Figures 5 and 6 pre- 
sent space profiles for a geometric weighted average of all our benchmarks, and 
profiles for five individual benchmarks for two flow analyses and two function 
representation strategies. We show data for the uniform function representation 

® Although an advantage of the MLRISC framework is its portability, it still requires 
substantial work to port a code generator based on MLRISC. For this reason we 
have concentrated only on the SPARC architecture to date. 

® Such headers are currently unnecessary since we use conservative GC. But it is 
expected that in the future we will develop customized memory management. 
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strategy to indicate the amount of data needed to correctly closure convert fun- 
ctions without customizing representations. We show the selective sink splitting 
strategy as an example of a strategy that customizes function representations. 
The typed source splitting flow analysis is currently our most accurate analysis 
that does not split on variable occurrences. The min type respecting flow analy- 
sis is included to show size bloat that can occur when flow analysis provides no 
information beyond the type. 

Each space profile shows intermediate representation size information at va- 
rious CIL compiler stages. The legend in Fig. 5 explains how to interpret the 
data. Of particular importance is the position of the horizontal tick mark found 
in each bar of a profile. The portion of the entire bar below the tick mark is 
our conservative estimate of the space that might be required for a hypothetical 
non-duplicating representation of the term (including the space for type and flow 
information in such a term) . The position of the horizontal tick mark is compu- 
ted as the term size ignoring all but the leftmost branches of virtual records and 
virtual case expressions. Ignoring all but the leftmost branches approximates 
the size of a non-duplicating “skeleton” that could be instantiated to the full 
duplicating type representation. Since we do not include any information about 
the non-leftmost branches, we assume that our approximation underestimates 
the true size of the a non-duplicating representation. Virtual record nodes and 
virtual case nodes are included in the count because they serve as markers for in- 
tersection type introduction and union type elimination points. We assume that 
such markers would be required in any non-duplicating representation. Virtual 
projection and virtual injection nodes are included to approximate (resp.) the 
markers required for intersection type elimination and union type introduction 
forms. Finally, the count also includes coercion nodes 0] 

The size information was gathered by adding a function to the SML/NJ 
runtime system which runs the mark stage of the SML/NJ garbage collector 
using a particular object as the root. The function reports the size of all marked 
objects that are reachable from the root object. We present all size information 
in bytes rather than in type or term constructor nodes. We And that the average 
size of our type nodes and of our term nodes for a given benchmark is generally 
in the range of 10 to 12 times the size of a machine word. 



3.2 Interpretation of the Space Profiles 

Interpreting the size of the untyped term. When compiling small pro- 
grams, the untyped CIL code, U, is smaller than the typed FLINT code, F. For 
benchmark programs of any reasonable size, the untyped CIL code is slightly 
larger than the typed FLINT code. This is due in part to the fact that the 
CIL representation carries more information about records and datatypes than 
does the FLINT representation. Of the profiles shown in this paper, only quad 
shows less space for untyped CIL than for FLINT; in all other cases that we 

^ An even more conservative approximation of the space required for a nonduplicating 
representation would be the size of the type-erased term. We believe that this is 
unrealistically small. 
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Legend: 

F =size of FLINT code. 

U=size of untyped OIL. 

I =size of result of Type Inference / FA. 
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T=size of result of Representation Trans. 
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Fig. 5. Sizes of benchmark phases by strategy and flow analysis I 
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Fig. 6. Sizes of benchmark phases by strategy and flow analysis II 
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show, the untyped CIL code is larger than the FLINT code. While other small 
benchmarks are smaller in untyped CIL than in FLINT, the weighted average 
shows that untyped CIL is usually the bulkier representation. 

The F and U columns are not quite comparable for several reasons. The F 
column overestimates the size of the FLINT code in the sense that it includes 
the size of FLINT type information. FLINT and CIL also differ in terms of which 
basis functions are compiled with the program and which are pre-compiled in 
the run-time system. 

Columns F and U are independent of the flow analysis or the function re- 
presentation strategy, but are repeated in each profile as reference points. 



Interpreting the output of the Type Inference/Flow Analysis stage. 

Column I shows the size of the typed and flowed term output from the TI/FA 
stage. As illustrated by the representative space profiles, the TI/FA pass can 
expand the size of the term by introducing virtual nodes. In monomorphic bench- 
marks, (e.g., boyer2, fft, and frank), term size is only increased by the addition 
of coerce forms that indicate where subtyping is used. In benchmarks with po- 
lymorphic functions (e.g., life, and quad), the TI/FA stage makes one virtual 
copy (using A) of each polymorphic function at each flow-erased type at which 
the function is used. 

In the two flow analyses shown, the distance of the tick mark from the top of 
the I bar reflects the amount of type polymorphism in the benchmark. In general, 
the tick mark indicates the amount of polyvariance of the analysis, which, for 
some analyses, may be substantial even for monomorphic code. 



Interpreting the output of the Flow Separation stage. Column S shows 
the size of the output from the FS stage. The FS stage introduces whatever new 
virtual constructs are required to ensure that the result of the (later) RT stage 
will be well-typed. For example, abstractions that share a call site may have 
the same type, up to flow information, after the TI/FA stage, but may differ 
from each other in the number, name and types of free variables. The FS stage 
must create types that differ in structure as well as in flow information for these 
different terms. 

Under the uniform strategy, the growth in size from I to S is due only to 
differences in the environment component of closures - differences that will not 
be reflected in the object code. In other strategies, some of the growth may be 
due to function representations that require different object code. 

The growth in size from I to S depends on the accuracy of the flow analysis. 
In the min type respecting flow analysis, the labels for all abstractions of a given 
(flow erased) type appear in the source label set for each application site for that 
type. This requires the flow separator to introduce larger intersection and union 
types, and to perform more virtual term duplication than would be required for 
a finer flow analysis. This is seen consistently throughout the data, with frank 
being the most dramatic example, and boyer2 being the least dramatic. The 
frank benchmark is a combination of human written code for a Warren Abstract 
Machine using some curried and higher-order functions, and machine generated 
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code to play a solitaire game on the WAM. The machine-generated code contains 
many different anonymous functions of the same few types but with different free 
variables. The min type respecting flow analysis causes these calls to be conflated. 
The boyer2 benchmark is a tautology checker which has been written in closed, 
uncurried, first-order style. In boyer2, all abstractions are closed up to names 
of known functiont^ so there are few free variables requiring separation. 



Interpreting the output of the Split Reification stage. Column R shows 
the size of the output from the SR stage, which reifies some virtual constructs — 
splitting them to pave the way for different representations that will be installed 
by the Representation Transformation stage. The number of term and type nodes 
remains the same because the transformation is merely changing virtual entities 
to real ones0 However, reifying type and term nodes causes the the position 
of the tick mark on the bar graph to rise, giving an indication of how much 
reification is performed. 

Under the uniform strategy, the S and R columns show identical tick mark 
positions. This is expected because we implement only a single function calling 
convention for the uniform strategy, and so splits are never necessary. Under the 
selective sink splitting strategy, the position of the tick mark may change upwards 
due to reification of virtual constructions: this is what we expect from splittings 
introduced to circumvent representation pollution and to insert customized data 
representations. This is shown most dramatically in quad (a kernel repeatedly 
applying a doubling function), in which all virtual constructs are reified. In 
contrast, the fft (Fast Fourier Transform) benchmark shows no pollution of 
function representations when compiled with the selective sink splitting strategy. 
Most functions in fft are open, but the control flow structure of fft is quite simple: 
just nested loops, so open functions and closed functions never flow together. 

If we see even a little reification for a strategy, we know that some part 
of the transformed program will use a simpler representation. If this change 
is in an inner loop, then a single reification may dramatically affect program 
performance. To determine the effectiveness of a strategy, we need to show data 
about the performance of the transformed programs — something outside the 
scope of this paper. 

Our current SR stage is quite simple: if it encounters two different repre- 
sentations in a single virtual construct, then it converts the virtual construct 
into the equivalent real construct. Our current splitting algorithm can oversplit 
because it reifies a virtual form whenever it contains components that require 
different representations. But given an n-way virtual form whose components 
require m < n different representations, the virtual form could be replaced with 

® For this paper, known function names are treated as free variables. Enabling 
the known function optimization creates slightly smaller representations. The size 
decrease depends on the accuracy of the flow analysis (circa a 5% decrease when 
using the typed source split analysis). 

® The size of the term component decreases slightly in some profiles due to assymetries 
between virtual and real injections in the current implementation (e.g., life, with 
strategy = selective sink splitting and flow analysis = min type respecting). 
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a real form containing m virtual forms. Oversplitting will result in unnecessary 
duplicated code in the object file. Oversplitting impacts the performance of the 
generated code when the m-way real form could be more efficiently compiled 
than the n-way form. We have neither measured the amount of oversplitting ari- 
sing from the current algorithm nor have we experimented with other splitting 
algorithms. 



Interpreting the output of the Representation Transformation stage. 

The type information in a closure-converted term is larger than in the pre- 
converted term. This is visible in the profiles for all the benchmarks. Part of this 
growth is in the creation of types for the required closure and argument records. 
Part of this growth is the creation of types for environments. In our framework, 
programs with more open terms will experience more growth in types. 

The introduction of closure and argument records and the storage of free 
variable values in environments causes an increase in term size. In our imple- 
mentation of closure conversion, the major increase in term size is from projec- 
tions from the environment: our implementation puts in a projection from the 
environment wherever a free variable occurs^ The creation and destructuring 
of closure and argument records will show different percentage effects in diffe- 
rent benchmarks depending on the relation of the number of abstractions and 
applications to other term constructors. 

The boyer2 benchmark has the highest ratio of closed to open terms, so its 
term size grows, essentially, only by introduction of closure and (mostly empty) 
argument records; there are few projections. For this reason, the growth in size is 
relatively small. In contrast, fft has a high percentage growth. Transforming the 
nested looping functions of fft creates closures having large environment records 
and code containing numerous environment projections. 

The change in the position of the tick mark relative to the height of the bar 
from R to T indicates how much expansion occurs in virtual terms relative to 
real terms. The relative position of the tick mark decreases when there is a high 
ratio of virtual to real terms, but can increase when the total growth in the size 
of real terms is larger than that for virtual terms. 



Duplicating vs. nonduplicating intermediate representations. Columns 
I, S, R and T have tick marks showing our estimated lower bound on the size of 
a typed and flowed term in a non-duplicating TIL. The position of the tick mark 
shows that in the benchmark programs presented (and so far in all benchmarks 
that we have tried), for the flow analyses presented, the space used in CIL’s 
duplicating term representation is never more than about twice our estimate 
for a non-duplicating representation. This is both surprising and encouraging. 
However, it remains to be seen whether these results hold up in the presence of 
more poly variant flow analyses. 
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In the time since the measurements reported here were taken, we have modified the 
compiler to project each environment variable only once per function body. 



Program Representation Size in an Intermediate Language 



47 



Coarse vs. fine fiow analysis. We have shown that the choice of flow analysis 
can greatly influence the growth in term size needed to produce well-typed fun- 
ction representations. The most dramatic example occurring in the benchmark 
frank, where, for the uniform function representation strategy the min type re- 
specting analysis resulted in a size after Flow Separation 5.2 times the size of 
that produced using the typed source split analysis. At the other extreme, the 
benchmark boyer2 shows a slight decrease in overall size from typed source split 
analysis to min type respecting analysis. The min type respecting flow analysis 
yields a smaller number of flow types for the number of underlying flow erased 
types than the typed source split analysis. In the case of boyer2, the slightly 
larger term size using min type respecting analysis is offset by the significantly 
smaller size of the flow types. 

We have accumulated some data so far for the version of typed source split 
using only equality constraints. This analysis can be thought of as performing 
Henglein’s “simple” flow analysis pHenf)2| over monomorphized code, and is the 
flow analysis used in the RML compiler |T098j . As expected, profiles generated 
using this analysis generate somewhat larger code in many cases, than profiles 
generated with the usual typed source splif but are much closer to the profiles 
for typed source split than they are to the profiles for min type respecting. 

We have also implemented an analysis, limited let split, which causes some 
let and letrec bound definitions to be duplicated per occurrence of the bound 
variable, rather than just once per type. In this analysis, benchmarks life after 
the RT stage, and simple after TI/FA stage (but not subsequently), show a ratio 
of CIL code size to non-duplicating TIL code size of 2.1. The code size ratios 
are less than 2 for all other compiler phases and benchmarks in our benchmark 
suite. A study of aggressive nested cloning in a lazy functional language jFa,x01| 
shows code size increases of a factor of up to 3 for some benchmarks of up to 
800 lines of code. That study also shows that, when identical clones are merged 
after transformation, the code size increase is only a factor of 1.2. 



The cost of accurate closure types. The profiles give us some idea as to the 
compile-time space cost of accurately representing closure types. With uniform 
function representation and typed source split analysis the growth in size from 
the output of Type Inference/Flow Analysis stage to the output of the Repre- 
sentation Transformation stage shows the space needed for closure types and for 
virtual cases where multiple closures flow together. This growth ranges from the 
size of RT output 1.03 times the size of TI/FA output for boyer2 to 2.76 times 
for quad. The ratio of the types sizes is 1.02 for boyer2 and 3.11 for quad, 
quad is atypical, being a very small program constructed to have relativly large 
types. 

4 Conclusions and Future Work 

We have shown that the amount of space used in compiling SML with CIL terms 
and types is practical on our benchmarks for the more precise flow analyses that 
we have investigated. Most importantly, the term sizes in our straightforward 
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duplicating representation are never more than about twice our underestimate of 
term sizes using a non-duplicating representation. Transformations that use type 
and flow information on virtual terms to generate customized data representa- 
tions would be more difflcult to engineer in a non-duplicating representation. A 
factor of less than two in space is acceptable to avoid further complicating the 
transformations . 

This is the kind of result that requires benchmarking to determine, as it 
depends on the style in which programs are written. It appears to be the case that 
for the human written and machine generated programs which we have been able 
to test that (1) the bulk of the program code is not used in a highly polymorphic 
manner so that a whole program analysis finding actual polymorphism rather 
than potential polymorphism need not perform too much duplication - this limits 
the number of virtual records created in type inference; (2) A reasonable flow 
analysis will And that a large percentage of calls in most programs are direct 
calls - this limits the number of virtual cases created in the Flow Separation 
phase for correctness of typed closure conversion, and for pollution removal in 
the selective sink splitting strategy. 

The typical non-trivial growth in size from the result of TI/FA to the result 
of RT is obviously undesirable, and might be smaller in an intermediate repre- 
sentation that could hide environment types with an existential quantifier. This 
raises the question of whether the more precise type information maintained 
in CIL after closure conversion without the 3 type quantifier is useful in terms 
of transforming a program for better run-time performance. If not, we should 
extend CIL with existential types. 

Although the standard technique for hash-consing types sketched earlier is 
the one used to generate the statistics for this paper, we have almost finished 
changing to a new type hash-consing scheme, which we expect to give much 
better performance. The motivation for the new scheme is due to the combination 
of (1) the pervasive use of recursive types in CIL and (2) the fact that the 
CIL type system identifies recursive types with the infinite trees that result 
from unwinding them infinitely. The new scheme represents types as directed 
graphs and implements recursion using cycles. The use of cycles to represent 
recursion automatically causes a-equivalent types to be shared — the variable 
names are no longer present leaving only the structure of the recursive type 
to be stored in this representation. It will also avoid the need to have type 
manipulation special-case the type recursion form (which can currently appear 
anywhere). The new scheme uses a method of incremental DFA minimization 
to maintain the invariant that each possible type is represented by at most 
one node in the graph. This will allow constant-time type equality checking, 
which our current hash-consing scheme does not support due to the possibility 
of differing representations of the same recursive type. 

Our new method of incremental DFA minimization to represent all types in 
the same graph is similar to a method suggested by Mauborgne |Ma,iiflflj . but 
was developed completely independently. Our method needs 0(n log n) space 
to store the types, while Mauborgne’s needs 0(n^ log n) space, where n is the 
number of distinct types and some upper-bound on the arity of type constructors 
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is assumed. Also, even in cases where Mauborgne’s method approaches linear 
space complexity, ours will typically use half as much space. 

Encoding more flow analyses in CIL remains an important area for future 
work. Recent work has shown that many standard flow analyses, such as k-CFA 
| bhi91IJW95I^M7| and the cartesian product argument-based analysis 
can be encoded into a type system with intersection and union types and flow 
labels jPPflXIATflflj . However, unlike CIL, these type systems have deep subty- 
ping. We are exploring a translation between deep and shallow subtyping that 
will allow us to employ these recent theoretical results in the CIL compiler. We 
are eager to see how highly polyvariant flow analyses affect our results regarding 
the duplicating term representation. 

There are many areas for improvement in the CIL compiler as a whole. The 
compiler can benefit from many standard optimizations not yet implemented 
(e.g., tuple flattening and loop optimizations) as well as some important non- 
standard optimizations (e.g., the complete removal of polymorphic equality). 
Several existing algorithms can be more efficiently implemented, such as the 
algorithm used in Split Reification. There are also many opportunities for im- 
provement in the representation of the intermediate language. 

We have designed and implemented a general framework for generating cu- 
stomized data representations, but work remains to be done in optimizing those 
representations and developing heuristics for choosing between allowable repre- 
sentations. In terms of function representations, we are currently investigating 
function representations that do not close over variables whose values are availa- 
ble on the stack (the so-called lightweight closure conversion of |SW97| h higher- 
order uncurrying removing manipulation of records with known compo- 

nents (along the lines of the fictitious data elimination in |Sis99| h and register 
allocation and calling conventions informed by flow information. We have yet 
to explore customized representations for other kinds of data, but CIL is rich 
enough to support flow-directed representation transformations for all types of 
data. 

Finally, we emphasize that this report has focused only on compile-time space 
issues. In the future, we will report on compile-time time complexity as well as 
run-time space- and time-complexity. 
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Abstract. We suggest a model for dynamic loading and linking as in 
Java. We distinguish five components in a Java implementation: evalu- 
ation, resolution, loading, verification, and preparation - with their as- 
sociated checks. We demonstrate how these five together guarantee type 
soundness. 

We take an abstract view, and base our model on a language nearer 
to Java source than to bytecode. We consider the following features of 
Java: classes, subclasses, fields and hiding, methods and inheritance, and 
interfaces. 



1 Introduction 

Java’s recent spectacular success is partly due to its novel approach to code 
deployment. Rather than compiling and linking a fixed piece of code for a tar- 
get machine, Java is compiled to bytecode^J; that can be executed on several 
platforms, and can link further code on demand. The security of Java greatly 
depends on type safety Type safety is ensured by the bytecode verifier, which 
checks that loaded bytecode conforms to the rules of the Java source language, 
and by the verifier’s interplay with the other components of the Java abstract 
machine. 

The bytecode verifier was formalized as a type inference system Eaiani 
I22| . where stack locations have types on a per-instruction basis. m reported 
security flaws due to inconsistencies between loaders, which were rectified in 
later releases, as described in m- An operational semantics for multiple loaders 
is given in in). Thus, various components of Java and the virtual machine have 
been studied at considerable depth in isolation, but, except for this work and 
[12 , '112 7 j their interplay has not yet been formalized. 

We attempt a synthesis, and consider the complete process, consisting of five 
components: evaluation, loading, verification, preparation and resolution. We 
base our model on a language that is nearer to Java source, than to bytecode as 
in |2JI27| . 

Our model is therefore useful for source language programmers: Even if they 
do not program in bytecode, and do not download unverified bytecode, they may 
become aware of these issues, and may trigger verification, resolution or load 

* This work was partly supported by EPSRC, Grant ref: GR/L 76709 

R. Harper (Ed.): TIC 2000, LNCS 2071, pp. 53-0 2001. 
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errorsJil Furthermore, a clear understanding of these checks and their interplay 
at a level independent of the bytecode is crucial for the design of new binary 
formats for Java. In fact, while most Java implementations use the class format 
m, any format satisfying the properties outlined in ch. 13.1 of HU may be used 
instead. Last, because our model is at a high level, and independent of Java 
reflection, it demonstrates clearly, through the format of the judgments, how 
components depend on each other. 

We distinguish the checks performed by verification and resolution, and 
demonstrate their dependencies: Resolution checks do not guarantee consistency 
unless applied on verified code, nor are verification checks suflicient unless later 
supported by resolution checks. Our model clarifies which situation will throw 
which exceptions, a question that is not unambiguously answered in H32U> and 
demonstrates how execution of unverified code may corrupt the store. 



1.1 Overview of Java Dynamic Linking and Loading, and of Our 
For malizat ion 

In traditional programming languages, e.g., Ada, Modula-2, the compiler checks 
all type-related requirements, and produces code which does not contain type 
information. If the various components of the program code have been compiled 
in an order consistent with their dependencies (dependencies through imports or 
inheritance) then execution is expected to be be sound with respect to types. Be- 
fore execution, the code is linked eagerly, and all external references are resolved 
and type-checked. Execution therefore has the form 

e, a, Code ^ e^, cr^, Code 

i.e., takes place in the context of fixed Code, and modifies the expression and 
the store. 

Java on the other hand, does not require the complete program to have been 
linked before execution. During execution, a type {i.e., class or interface) may 
be needed which is not in the current code. If bytecode for the type can be found 
and verified, then the code is enriched with the new type. Furthermore, Code 
consists of a verified, prepared part P, and a loaded part L, which was loaded 
in order to support verification of P. We consider language £, which stands for 
loaded binary programs, and V, which stands for verified and prepared binary 
programs. 

Therefore, we describe execution in terms of expressions e, states a, verified 
code P, and loaded but not verified code L. It has the general form 

e, cr, P,LiL 2 e', cr', PPi, L2L3 

thus describing that the expression may be rewritten, the state may be modified, 
code may be loaded, and some of the loaded code may be verified and prepared 
- the terms L 1 L 2 , PPi, and L 2 L 3 indicate concatenation of £ or P code. 

^ By compiling modified Java classes without recompiling all importing classes one 
may obtain bytecode that does not verify. Also, execution sometimes does not at- 
tempt to verify local classes. 



An Abstract Model of Java Dynamic Linking and Loading 



55 



We classify execution into the following five components: 

— evaluation corresponds to execution as in most programming languages, 

— resolution is the process of resolving references to fields and methods, 

— loading is the process of loading types required for further execution, 

— verification is the process of verifying C code, 

— preparation turns verified C code into V code. 

Evaluation is the execution that is unaffected by the dynamic linking nature 
of Java, e.g., assignment to variables, loops, conditionals, parameter passing, etc. 
Resolution applies the offsets of the static type stored in field access or method 
calls to an object or to the dynamic class of the receiver. 

Loading loads types {i.e., class bodies or interface descriptors) necessary for 
the verification of further classes, or for the resolution of field access and method 
calls. A loader exception is thrown if the type cannot be found. Verification 
checks that the subtype relations required in some expressions are satisfied, but 
does not check the presence of fields or methods referred to in some piece of 
code. This is checked only when and if the method or field is accessed; if these 
cannot be found, then a resolution exception is thrown. A verification exception 
is thrown if verification is not successful. Preparation determines the object and 
method lookup table layout for classes, ensuring that the offsets for inherited 
fields and methods coincide with those of the superclasses. 

In Java literature, the term linking describes resolution, verification and 
preparation. Java resolution is particularly interesting: It takes place at run- 
time, but has both a static and a dynamic part: it depends on the particular 
(dynamically loaded) classes or interfaces mentioned in the corresponding signa- 
ture, and on the particular object which appears as receiver in the corresponding 
expression - more later. 



An example. We demonstrate these components in terms of an example, which 
is also outlined in figure We also use some of our notation, which we will 
introduce formally in later chapters. 

Consider the following high level view of bytecode method call: 
new A[A, int, void].m( 3) 

which stands for the call of a method m, with receiver new A, and argument 3. 
The signature [A, int, void] indicates that m is defined in class A, takes an int 
parameter, and returns voicfl. 

We start with configuration (1), i.e., the prepared code is P, the loaded code 
is L, and the store is cr. Assume that the above expression had been verified, 
but that class A was not defined in L, nor in P. Since an object of class A has 
to be created, class A needs to be loaded and verified. If A cannot be found, a 
loader error. Load Err, is thrown. Otherwise, A is loaded, and L is extended by 
La. Assume also that class A had a unique method 

void m(int x){ B aB; aB = new C; aB[B,int].f = x }. 

Note that the term aB[B, intj.f indicates selection from aB of a field f defined in 
class B with type int. 

^ Method calls in Java bytecode contain the signature of the method. 
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new A[A,int,void].m(3), a, P, L 

load A, found? 

NO, then LoadErr 

YES; assume LA=ld{A, P, L) 

and assume that class A contains unique method 

void m(int x){ B aB = new C; aB[B,int].f = x } } 

new A[A,int,void].m(3), a, P, LLa 

attempt to verify A 

attempt to establish C subtype B 

load C and superclasses 

found? NO, then LoadErr 

YES 

C subtype B? 

NO, then VerifErr 

YES, assume C direct subclass of B, i.e., LbLc = ld{C, P, L) 
then established P, L b) C < B LbLc 
thus verified A, i.e., P, L hs La O *-bLc 
thus can prepare A, thus PA=pr(LA, P) 

new A[A,int,void].m(3), a, PPa, LLbLc 

create an A object, a' — a\a i— >■ A...], a new in a 



(4) a[A,int,void].m(3), o' , PPa, LLbLc 

(5) aB = new C; aB[B, intj.f = x, o" , PPa, LLbLc 

attempt to verify C 

success? NO, then 

YES, assume PPa, L k, LbLc O L 0 
assume PBPc=pr(LBLc, PPa) 

(6) aB = new C; aB[B, intj.f = x, o" , PPaPbPc, L 



VerifErr 



(7) a'[B,int].f = 3 ,ct'",PPaPbPc,L 

find offset of field int f in class B 
found? NO 

YES, assume (j>= B, int, PPaPbPc) 
store 3 at a'+ip, i.e., o"" — o”'[{a' + !->■ 3] 

(8) 3, < 7 "", PPaPbPc, L 



NoFIdErr 



Fig. 1. Example of loading, verification, resolution, evaluation, and preparation 
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We now have configuration (2). Class A needs to be verified, and so all method 
bodies, and all superclasses of A will be verified, and all required subtype re- 
lationships will be checked. In our example, verification of the method body in 
A requires class C to be a subtype of B. Assume that C has not been loaded 
yet. Then, either a loader error will be thrown (LoadErr) or C will get loaded 
together with all its superclasses. Assume that the superclasses of C only include 
B. So, we have established that C is a subtype of B, while loading Lb and Lc. 
In terms of our formalism, we have established P, L h[, C < B LbLc. This 
gives successful verification while loading Lb and Lc. In terms of our formal- 
ism, P, L hi La O LbLc. Note that Lb and Lc are loaded but not verified. 
Then prepare A, obtaining PA=pr(LA,P), which contains the information from 
La extended by offset information. We replace La by Pa, and load Lb and Lc. 

We arrive at configuration (3). Then we create the A object. The new state, 
cr', contains the new object at address a, whose first cell indicates its class, 
namely A. 

We obtain configuration (4) . We then execute the method call. This requires 
resolution i.e., looking up the offset of the method in class A stored in the 
signature, and application of this offset to the method lookup table of the class 
of the object stored at a. In this example, the two classes coincide. The method 
body is aB = new C; aB[B,int].f = x }. 

This leads to configuration (5). Execution of the expression new C requires 
verification and preparation of the classes B and C. If verification fails, then 
a verification error is thrown. Otherwise, assume that verification did not re- 
quire loading of any further classes, i.e., P, LLa hy LbLc O L0, and that 
preparation of C and B gives PcPb, he., PcPb=pKLcLb, P)- 

This leads to configuration (6). We then create a new C object at the new 
address a' and obtain store a'" . 

After some steps, we obtain configuration (7) - assuming that x in a'" con- 
tains the value 3. For the assignment int].f = 3, the field access a'[B, int].f 
has to be resolved. If class B does not have a field f of type int, then the resolu- 
tion exception NoFIdErr is thrown. Otherwise, resolution returns 0, the offset of 
int f from class B. This offset is used to access the field in the object at a' . The 
object at a' happens to belong to class C, which is different from B. But because 
C is a subclass of B, and because preparation guarantees that the object layout 
of a class conforms to that of a superclass, class C will have inherited the field at 
the same offset as in B. And so, the assignment will not break the consistency 
of the object. 

This brings us to configuration (8). 

If however, the method body had not been verified and C was not a subclass 
of B, or if resolution did not read the offsets properly, or if preparation did not 
preserve the object layout from superclasses, then the integrity of the object 
could be violated - more on that in section 3.1. 
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term 



meaning 



definition 



Le C 
PG V 
e 

<7 



loaded code def^ 

prepared code def^y 

a term (identical in C and 'P) fig^^ 

a store, mapping identifiers and addresses to identifiers or sect. 5.1 
integers 



r(t,p), T(t, L) 

A 1 (m, c, t2, ti, L) 

•^(f,c,t,P) 

Ps{c, P) 

c, t2, ti, P) 



P) 

/Aff (m, i, t2, ti, P) 



the superclasses/superinterfaces of t in L, P defs^EI 

the body of method m, with argument type t2, 

and result type ti in class c def^ 

the offset of field f with type t in class c defEI 

all fields with types and offsets, defined or inherited in class c defQ 
the offset of method m with argument type t2, 

and result type ti in class c defEI 

the method body at offset ^ in class c defQ 

the offset of method m with argument type t2, 

and result type ti in interface i defQ 



e, cT, P, L1L2 e rewrites to e^, and <j rewrites to g' ^ 

e , cr , PPi, L2L3 prepared code augmented by Pi, new code L3 loaded fig 0 



I — . ^ cip 
I — . — i^yp 



expression context, propagates to sub-expression figO 

null context, may throw exception fig o 

type context, may cause loading and verification fig El 



P. L h c 


^clss 


P. L h c 


^ impl 


P. L h i 


^intf 


h P.L 


Oa 


h P,LOs„ps 



c is a subclass of c in context of P, L 

c implements i in context of P, L 

i is a subinterface of in context of P, L 

the subclass/subinterface relationship in P, L is acyclic 

P, L contain all supertypes of types defined in PL 



figE 
fig I 
fig 2 
fig I 
figE 



P. L h, L' O 



P, L h, t < t' 
P, L, E b, e : t 



verifier checks that is well formed in context P, L, and fig 0 

loads VJ' 

verifier checks that t widens to t^ in context P, L, and loads figQ 

verifier checks that e has type t in context P, L, while fig Cl 

loading 



E 



environment for the declaration of variables 



figil 



P. L h t < t' 
P, L, E h e : t 
P. L h P^ O 
L h PO 



t widens to t^ in the context of prepared P, and loaded L 
e has type t in the context of prepared P, and environment E 
P^ is well-formed in the context of P and L 
P is well-formed in the context of L 



flgO 

fig ^ 
fig 3 
fig 3 



CT, P b /3 : t 
(T,P \-^ a 0 

P. E b CT O 



value f3 conforms weakly to type t in context of P 
the object stored at a in cr is well-formed (conforms strongly) 
all objects in g are well-formed, and agree to their 
declarations in E 

runtime expression e has type t in store g in the context of 
P, L, E 



fig uni 



ld{t, P, L) loading 

pr(L, P) preparation 



def ^ 
def 0 



Fig. 2. Concepts defined in this paper 
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Thus, the above example demonstrates 

— classes may be loaded without being verifiecfl, 

— execution of verified code may throw loader, resolution or verification errorfl 

— verification checks subtype relationships, and does not guarantee the pres- 
ence of methods or fields, 

— resolution checks the presence of methods and fields, 

— verification and resolution checks complement each other. 



The treatment of interfaces. In order to establish that required subtype 
relationships are satisfied, verification looks up the appropriate classes. However, 
if the required subtype relationships involve interfaces, then these relationships 
are automatically assumed to hold and are not checked! 

Apparently overawed by the multiplicity of parents possible in a Java 
interface hierarchy, the implementors of Sun’s verifier ... abdicated re- 
sponsibility for type checking involving the use of interfaces. Instead, 

..., the burden of checking for compatibility, ... passed implicitly to the 
runtime system. 

Philipp Yelland ^ 

Thus, at runtime these subtype requirements need to be checked, and execution 
of interface method calls will check the satisfaction of the associated subtype 
relationship. Again, we see that checks from two different JVM components 
complement each other, and in slightly different ways for classes than for inter- 
faces. 

An example is given in the appendix. 



Organization of this paper. In figure El we list all judgments and functions 
defined in the paper, with a brief description of their intention, and the place 
of their definition. In section 0 we introduce C and V for the description of 
loaded or prepared code. In sectionElwe describe an operational semantics, and 
distinguish the five components. In section 2| we define consistency of states with 
prepared code, and types for runtime expressions, and we state subject reduction 
and progress lemmas. In section 5 we give a summary and outline alternatives, 
and in section 6 we draw conclusions, compare with other work, and introduce 
some open questions. 

Hand-written proofs are available at http : //www. doc. ic.ac.uk/ ^ scd/proofs. 



^ In the particular example, all loaded classes are eventually verified, but it would take 
a slight modification {e.g., put the creation of the C object in a conditional), for this 
not to be the case. 

^ The latter while attempting to verify further classes. 
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2 The Languages C and P 

The languages C and V present an abstract view of the Java bytecode. For the 
sake of simplicity, we only consider classes, subclasses, interfaces, subinterfaces, 
assignment, method overloading and inheritance, field inheritance and hidingjj 
We chose these features, because inheritance with fields allows for an interesting 
notion of consistent state, inheritance with method calls and fields demonstrates 
the interplay between resolution and verification, and interfaces pose the same 
requirement as classes, but are treated differently. We do not model super. Even 
though our examples use sequential statements, we have not included them in 
the C- and 7^-syntax, as they can be easily encoded by extra methods. Also, 
all methods have one argument - multiple arguments can be encoded through 
objects. 

Expressions. Figure0contains the syntax of expressions in £ or in P programs. 

Field accesses and instance or class method call^ are annotated by signa- 
tures. Field access has the form ei[ti,t2].f, where ti is the class containing the 
field definition, and t2 the type of that field. Instance method calls have the 
form ei[ti,t2,t3].m(e2), where ti is the class containing the method definition, 
t2 is the type of the method’s argument, and ts is the result type. Similarly, 
interface method calls have the form e[ti,t2,t3]bm(e2), where ti is the interface 
containing the method header, t2 is the argument type, and ts is the result type. 

The only types we consider are classes, interfaces, and int; these demonstrate 
several interesting properties of the Java system. Interfaces introduce multiple 
subtyping. More interestingly, subtyping introduced through interfaces is dealt 
with differently from subtyping introduced through subclassing: as we shall see, 
the verifier assumes an interface to be a supertype of any type, whereas it consid- 
ers a class to be a supertype of its loaded subclasses only; conversely, at runtime 
subclasses are not checked for instance method calls, but subtypes are checked 
for interface method calls. Also, the type int and the address calculations dur- 
ing execution open the possibility of pitfalls, which, as we shall demonstrate, are 
averted by verification and the resolution checks. 

Values are either integers, or addresses of objects. Addresses are represented 
by positive numbers and are denoted by a, a' etc, the null pointer is denoted by 
0. Values, whether they stand for addresses or for integers, are denoted by ( 3 , P' 
etc. 

Contrary to Java source language rules m, C- and P-methods may have the 
same identifier and argument type but different result type as a method from 
a superclass. Such binaries may be created, e.g., through compilation of a class 
and its subclass, subsequent addition of a method in the superclass, and recom- 
pilation of the superclass without recompilation of the subclass. The method 

® £ is a similar language to language Java,r.itn|1 or the Java subset from m ; it is 
larger than m because it considers imperative features, overloading and interfaces; 
and, though at a different abstraction level than m, it is larger because it studies 
interfaces. 

® corresponding to bytecode instructions getfield, putfield, invokevirtual and 
invokeinterface. 
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e £ Expr 



V 

t £ Typ 

(P £ Offs 
XGErrOffs 



a £ Addr 

P £ Val 



::= e[t,t,t].m (e) 

— e[t,t,ty.m (e) 

— V = e 

— new c — this 

— V — P 

NIIPErr — LoadErr 
VerifErr — CIssChngErr 
NoFIdErr — NoMethErr 

::= e[t,t].f 

— z 

::= c — i — int 



method call 
interface method call 
assignment 

object creation, receiver 
variable, integer value 
null-pointer err., load err. 
verification err., class change err. 
field not found, method not found 
field access 
parameter 

class, interface or integer 




offsets 

member undefined 
type of wrong kind 
type undefined 



:;= 0 — 4> address 

::= a — —1 — —2 — ... value 



c, i £ld 
m, /, zG\d 



c class names, i interface names 
m method names, / field names 



Fig. 3. The syntax of expressions 



calls will then be dis-ambiguated through the result type of the signature. For 
example, x[ci,t2t3].m2(...) selects from class ci the method with parameter type 
t2 and result type ta, whereas x[ci,t2,t4].m2(...) selects from class ci the method 
with parameter type t2 and result type t4. 



Language for loaded code, C. Rather than give the syntax of C and V 
programs, we describe these, as in cm, through functions that lookup the super- 
classes, superinterfaces, fields and methods of a class or interface. 1 P{A) denotes 
the powerset of A. 

Definition 1 The tuple L 0 ) is a language for loaded code, iff 

— C is a set. 

— T is a function, T '■ Id x £ — (Id x 2P(ld)) U (2P(ld)) U {e}. 

— M. is a function, A4 : Id x Id x Typ x Typ x £ — Expr U {e}. 

L0 £ £, Vt £ld: 7 ~(t, L0) = e. 

— for any Li, L 2 £ £, their concatenation, L 1 L 2 , giues a further element of C, 
with: 

• T(t, L1L2) = T(t, Li) if T(t, Li)y^e, T(t, L2) otherwise. 

• Ad(m,c,t2,t3, L1L2) = Ad(m,c,t2,t3, Li) if T(c, Li)y^e, 

Ad(m, c, t2, ta, L2) otherwise. 
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In the above, L 0 indicates the empty program in £, and e indicates lookup of 
a non-existing entity. T(t, L) is intended to return the direct superclass of t and 
the (possibly empty) set of its direct superinterfaces, if t is declared as a class in 
L; or, return the (possibly empty) set of the direct superinterfaces, if t is declared 
as an interface in L; and e otherwise. AI(m, c, t 2 , ti, L) is intended to return the 
body of method m defined in class c, with result type ti and argument type t 2 , 
or e if no such method is found. Note, that we have no functions looking up the 
fields, nor any functions looking up entities in interfaces - this is so, because 
these are not used for verification, and so, in our setting can be considered as 
non-existing in £ code. 

For example, assume classes B and C, class B has fields int fi, and C f 2 , and 
method int m(D x){4 + 44}, where C extends class B, implements interfaces 12 
and 14 and has field int fi, and method int m(D x){777}. It would be repre- 
sented through Lbc, whith T(B, Lbc)= Object, { }, T(C, Lbc)= B, {12, 14}, and 
for method lookup: A4(m, B, D, int, Lbc) = 4 -1- 44, A4(m, C, D, int, Lbc) = 777. 

In jSj we gave the syntax of a language for which we defined by construction 
functions corresponding to Tand Ai. Therefore, the definition C] is well-formed. 
From now on, we expect L 0 ) to stand for a fixed language for loaded 

code. 



Language for prepared code, V. The language V describes code after 
preparation; the programs are extended by method and field lookup tables. We 
model this by offsets, which are positive numbers, denoted by </>, ^'sOffs, while 
ySErrOffs indicate non-existing entities, or entities of the wrong kind. 

Definition 2 A tuple {V ,T , Mff , Jjf , Me, Ai^) is a language for prepared code 

iff 

— V is a set. 

— T is a function, T '■ Id x P — (Id x 2P(ld)) U (2P(ld)) U |e}. 

— Mff is a function, Mff : Id x Id x Typ x Typ x V — )■ Offs U ErrOffs. 

— J-ff is a function, J-g \ Id x Id x Typ x V — >■ Offs U ErrOffs. 

— Me is a function. Me '■ Offs x Id x P — > Expr U |e}. 

— Mff is a function. Mg : Id x Id x Typ x Typ x V — >■ {0} U ErrOffs. 

— V c, c', m, t, t',P: T(c, P) = c', {...} 7i4-(m, c, t, t', P) = -2. 

— V i, m, f, t, t',P: T(i,P) ={...} ^ .%(f,i,t,P) = -2, Mff(m,i,t,t',P) 

= - 2 . 

— V t, m, f, ti, t 2 ,P: 7V%(m,t,ti,t2, P) = -3 .%(f, t, ti, P) = -3 

7L4(m,t,ti,t2, P) = -3 T(t,P)=e. 

-Peer, Vte Id : r(t,P0) = e. 

— For any Pi, P 2 € V , their concatenation, P 1 P 2 , gives a further element ofV, 
with 

• T(t, P 1 P 2 ) = T(t, Pi) if T(t, Pi)y^e, T(t, P 2 ) otherwise. 

• 7%(m,c,t2,t3,PiP2) = 7Wjy(m,c,t2,t3, Pi) if T(c, Pi)y^e, 

A%(m,c,t 2 ,t 3 , P 2 ) otherwise. 

• A4e(()),C, P 1 P 2 ) = Mei4',c,Pi) if T(c, Pi)y^e, Me(4>, C,P 2 ) 

otherwise. 
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• P 1 P 2 ) = Pi) if r(c, Pi)y^e, :^(f,c,t, P 2 ) 

otherwise. 

• 7Wj^(m,i,t2,t3, Pi) = 7W^(m,i,t2,t3, P 1 P 2 ) if T{\,Pi)^e, 

Ad^(m, i,t 2 ,t 3 , P 2 ) otherwise. 

The function T has the same intention as in def. d The functions 
A%(m,c, t 2 , ti, P), or ^(f, c, t, P) are intended to return the offset of method m 
defined in class c with argument type t 2 and return type ti, or the offset of field 
f defined in class c with type t; if c does not contain m or f, then they should 
return — 1 , if c is the name of an interface, they should return — 2 ; and if P does 
not define any type for c they should return —3. The function Me{4i,c, P) looks 
up the method body in class c using offset (f), while i,t 2 ,ti, P), used for 

interface method call, indicates whether the method m with argument type t 2 
and result type ti is defined in interface i; it should return 0 if i defines such a 
method, — 1 if i does not define such a method, —2 if i is the name of a class, 
and — 3 if P does not define any type for i. 

For the program Lbc from before a possible corresponding V program is 
Pbc where, for all teTyp: T(t, Pbc)= T(t, Lbc), and B, int, Pbc) = 3, 

•%(f 2 , B, C, Pbc) = 5, A%(m, B, D, int, Pbc) = 2, Ade(2, B, Pbc) = 4 + 44, and 
for class C, we would have ^(fi, C, int, Pbc) = 6, C, D, int, Pbc) = 2, 

7We(2,C,PBc) = 111 - 

In |S| we gave the syntax of a language for which we defined constructively 
functions corresponding to T, Mg, Ad e, and Adj^ . Thus, the definition 0 
is well-formed. From now on, we assume {V ,T , Mg , Jg , Me, Mff) to be a given 
language for prepared code. 

Combined C and P code. We define functions to collect all types declared 
in C, or P, or combined code, and to collect all fields and all methods in such 
code. 

Definition 3 For Lg£, P GV, cgid with T(c, P) = d , {...}, we define: 

- T(t, P,L) = T(t, P) if T(t, P)y^e, T(t, L) otherwise. 

-rs(L) = {t| r(t,L)y^e}, Ts(P) = {t| T{t,P)^e}, 

Ts(P,L) = {t| T(t,P)yfe orT{t,V)^e}. 

-Ads(c, L)={ ti m(t 2 x){e} | e = Ad(m, c, ti, t 2 , L) yf e }. 

-A's(c,P) ={(t,f,(^)| %(f,c,t, P) = (£1 } U A's(c',P), A's(Object, P) 

= 0 . 



Thus, Ts(Lbc) = {B, C}, and Ads(B, Lbc) = { int m(D x){777}}, and 
A's(C,Pbc) = { (int,fi,3), (C,f2,5), (int,fi,6) }. 

Lemma 1 For programs L, Li, L2, G C, and P, Pi, P2, G P: 

- Ts(Li)nrs(L2) = 0 ^ LiL2=L2Li. 

- rs(Pi)nTs(P2) = 0 ^ PiP2=P2Pi. 

- Ts(L2) C Ts(Li) Li L2 = Li. 

- Ts(P2) c Ts(Pi) ^ PlP 2 =Pl. 

- Ts(P,L) = Ts(P) U Ts(L). 

^ Remember, that 0> 0, by definition. 
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3 Execution 

Execution, described in fig. 0 is defined in terms of a rewriting relationship on 
configurations, consisting of expression e, store a, prepared code P, and loaded 
binary L. The expression and store may be modified, more code may be prepared, 
and further code may be loaded. Thus, execution has the form e, a, P, L 1 L 2 
e',a',PPi,L2l3. 

In order to give a more concise description of the rewrite semantics, and 
also, in order to distinguish between routine rewrite rules, and those particular 
to Java implementation, in fig.E|we introduce three kinds of contexts. Expression 
contexts, C • are filled with a sub-expression; their execution propagates 

execution to this sub-expression, as in rule Propagate. Null-contexts, C • 
when filled with 0, raise an exception when executed as in rule NullPointErr. 
Type contexts, C- are filled with a type name; their execution causes the 
type to be loaded and prepared if the type is not part of the loaded or the pre- 
pared code, as in rules Load, LoadErr, Verif, VerifErr and VerifAndPreiO. 

We call an expression ground, if it is a value /^, and l-ground, if it is an 
identifier, or has the form a[ti,t 2 ].f. 



3.1 The Runtime Model 



States represent stacks and heaps, and contain values for identifiers and ad- 
dresses, Addresses point to objects. An object consists of its class (an identifier) 
and values for its fields. These are either int values (Val), or addresses (Addr). As 
Addr is the set of positive numbers, Addr C Val. The symbol e means undefined. 
The sets Val, Id, and {e} are disjoint. Stores thus have the form: 

a : [ Id— ( Val U{e} )] U [ Addr— >• ( Val U Id U{e} )]. 

The store lookup cr(z) or <j{a) describes the value of variable z, or address 
a in cr; if cr{a) =cGld then a points to an object of class c. The fields of the 
object stored at address a are stored at some offset from a. 

We say that an address a is new in cr iff V/3 > 0 : a{a + (3) =e. 

Our model of the store is therefore at a lower level than those found in studies 
of the verifier |2SC2I23|, where objects are indivisible entities, and where there 
are no address calculations. This lower level model allows us to describe the 
potential damage when executing unverified code; as in the following example. 

On the other hand, our definition of states requires the distinction of the sets 
Addr, Id and {e}, and so it is at a higher level than plain bitstrings. Even though 
’’real” memory contains such plain bitstrings, a faithful modeling of this aspect 
would not have promoted the study of Java dynamic linkin eFI . 



® Observe, that type contexts do not include the argument or result type of a method; 
thus, an unverified expression ei[ti,t2,t3].m(e2) could be executed without ever load- 
ing t2 or t3. 

® and thus, implicitly, also if it is an address a. 

We could have represented the distinction between Val, Id, and {e} through a tagged 
union, but this would have cluttered the presentation. 
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Evaluation 

Propagate 

e, O', P, L e', o-', P', L' 


NullPointErr 


CeD“L<T, P, L Ce'D“L(7',P',L' 

ACC 

z a variable 

z, O'. P, L o-(z), cr, P, L 

VarAss 


COD”",ct, P,L NIIPErr,<r, P, L 

NEW 

c e Ta{P) 
a new in o 

JT5(c, P) = {(tl fl <^l), . . . (tn fn <An)} 

a' = o[a t-¥ c,a + 4>i 0, . . . ,a 4>n ^ 0] 


z = /3, 0 -, P, L /3,o-[zha/3],P, L 


new c, o, P, L o:, o', P, L 


Resolution 


FldAcc2 

.%(f,tl,t2, P) = —1 


Fed ACC 1 

:^(f, ti,t2, p) = 0 

a[ti, t 2 ].f, O', P, L a{a + <}}), a, P, L 

a[ti, t 2 ].f = /3, O', P, L /S, o[a + ^ i— > /3], P, L 


a[ti,t 2 ]-f, o, P, L NoFIdErr, o, P, L 
a[ti, t 2 ]-f = /?, o, P, L 'Nd- NoFIdErr, o, P, L 

FldAcc3 

J^(f,ti,t2,P) = -2 




a[ti,t 2 ].f, o, P, L CIssChngErr, o, P, L 

ct[ti, t 2 ].f = /?, o, P, L CIssChngErr, o, P, L 


MethCallI 

A%(m,ti,t2,t3, P) = ^ 


MethCall2 

A4j^(m,ti, t2, t3, P) = -1 


Me{(p,(T{a), P) = e 
yi , Y 2 are fresh variables in o 
e' ^ e[yi/x,y 2 /this] 
a' = o-[yi i-A /3, y -2 a] 


a[ti,t2,t3].m(/3),cr, P, L ^ IMoMethErr, cr, P, L 
MethCall3 

t2, ts, P) = -2 


a[ti,t2,t3].m(/3),cr, P, L e',cr',P, L 


a[ti,t2,t3].m(/3),cr, P, L CIssChngErr, rr, P, L 


IntfMethCallI 

P, L h a{a) <,mpi ti 
A1i|(m,ti,t2,t3, P) = 0 


IntfMethCall2 

A1^(m,ti,t2,t3, P) = -1 


a[ti,t2,t3]*.rn(/3),o, P, L 

a[o(o;), t 2 , t 3 ]-m(^), o, P, L 


a[ti, t 2 , t 3 ]*.m(/?), rr, P, L ^ NoMethErr, cr, P, L 


intfmethCall4 

P, L 1/ o(a) <impi ti 


IntfMethCall3 

>^(m,ti,t2,t3, P) = -2 


a[ti, t 2 ,t 3 ]*.m(^), o, P, L CIssChngErr, o, P, L 


a[ti, t2,t3]*.m(/3), o, P, L CIssChngErr, o, P, L 


Loading 

LoadErr 

t ^ rs(p,L) 

CtD‘’L<7, P.L LoadErr,<7, P, L 


Verification 

VerifErr 

t e Ts(L) \ Ts(p) 

CtZl'"'’,<T, P, L VerifErr, (7, P, L 


Load 

e =ctD‘’"’ 

t ^ Ts(P,L) 

ld(t, P, L) = L' 5 ^ Lj, for a loader Id 
e, o, P, L e, O', P, LL' 


Preparation 

VerifAndPrep 

e 

tErs(Lx)\rs(P), h P,LiOsups 
P, L 1 L 2 1^ Li 0 L' 

Pj = pr(Li, P), for apreparation pr 
e,cr, P, L 1 L 2 e,rr, PPi,L 2 t' 



Fig. 4. Execution 




66 



S. Drossopoulou 



|-.^=xp (e) I a[t,t,t].m{\Z-Z\) 

I \Z-Z\[t,t].f 

I IZ • □ = e if C ■ □ non-l-ground variable 

!?; = □•□ if 1-ground variable 

□ f]-m (e) I \Z-Zi[t,t,t]Km (e) 

I \Z-Z\[t,t].f I n-z\[t,t]J =/3 

I a[lZO,f]./ I newCO 



Fig. 5. Contexts 



An example. For the program Pbc from section |3 and a class A subclass of 
Object and without any fields, the following store (Tq rnaps identifier anA to an 
object of class A, and a B to an object of class B: 



CTo(anA) 


= 2 


address of object 


cto(2) = 


A 


object of class A 


(To(aB) 


= 5 


address of object 


0 - 0 ( 5 ) = 


B 


object of class B 


cto(8) 


= 45 


field int fi from B 


o-o(lO) = 


11 


field C f 2 from B 


cTo(ll) 


= C 


object of class C 


(To (14) = 


55 


field int fi from B 


(To(16) 


= 0 


field C f 2 from B 


ao(17) = 


65 


field int fi from C 


CTo(y) 


= e 


for the other y’s 









Thus, 18 is new in uo, but 15 is not, even though crn(15)=e. 

Consider the expression 63 = anA[B,int].fi = 12 O- Because A is not a 
subclass of B, expression 63 does not verify. But if we either switched the verifier 
off or managed to fool the verifier, and executed 63 , crp, PabCj Lg , we would 
obtain 12, iji, Pabc> L 0 , where tTi=tTo[5 >— t 12]. In the new store, cti, the class 
of the object at address 5 has been overwritten by an integer; the consistency 
of the store has been destroyed! Thus, resolution checks alone do not ensure 
“well-behavedness” either. 

In the appendix we give an example which demonstrates the treatment of 
interfaces based on the one given by Buechi|2|. 

We now study the five components of execution. Note, that the five compo- 
nents are “disjoint”, in the sense that for any configuration, if a rule from one 
components is applicable, then no rule from another component is applicable. 



3.2 Evaluation 

Evaluation is the part of execution that is not affected by dynamic linking and 
verification. It is described in the first section of fig.0 and it comprises: 

The expression 63 could be the result of compilation of expression 64= anA.fi = 12 
in a context where anA had type A, and A was a subclass of B; then class A was 
modified so that it no longer was a subclass of B, A was recompiled, and 64 was not 
recompiled. 
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T(c, P, L) = c', {ii ...in} n > 0 

P 5 L L C ^clss c 

P, L h C <ciss c' 

P, L P C ^impt ij Vjp] — n 

P, L I- C <clss c" 

P, L h c" <clss c' 

P, L I- C <clss c! 



P, L h Object <ciss Object 



P, L h C <clss c' ^ 

c' = Object, or T(c', P, L) 7^ e 
P,Lhi <intf i' ^ T(i',P,L)7^e 

P, L h c <impl i T(i,P, L)7^e 

P P, L O sups 



T(i, P, L) = {ii ...in} n > 0 

P) L P i ^intf i 

P,LPi <tutf ij Vjel...n 



P, L P i <intf i' 
P, L P C ^impl i 
P, L P c' <clss c 
P, L P c' <impl 



P P , L O sups 

P, L P C <slss c', P, L P c' <slss c 
=> C — c' 

P, L P i <intf i^ P, L P i' <intf i 
^ i = i' 

P P,L 0„ 



Fig. 6. Subclasses, acyclic programs, programs with complete superclasses 



— propagation, i.e., propagate execution at the receiver and then the argument 
of a method call, at the receiver of a field access and to the left hand and 
right hand sides of an assignment (Propagate) B, 

— throwing the NIIPErr exception when attempting to call a method, access a 
field, or assign to a field of 0 (NullPointErr), 

— accessing variables or addresses (Acc), and assigning to variables (VarAss), 

— creating new objects (New) of already prepared class c (c G Ts(P)), initial- 
izing the fields with 0 at the offsets prescribed in P. Note that lFs(c, P) 
from def. El returns types and offsets for all fields declared in class c or in 
any of c’s superclasses. 



3.3 Resolution 

Resolution describes the process of resolving references to fields or methods. 
It corresponds to the bytecode instructions getfield, putfield, invokeinterface and 
invokevirtual. 

We describe these instructions in more detail, and at a lower level than they 
are described in m-- We describe what happens if the instruction attempts 
to access fields or methods from a class which is not a subtype of the type 
stored in the signature, and thus the offset obtained bears no relation to the 
runtime object. This situation is not described in 1211, although it may happen 
if unverified code is executecB- 

We did not supply rales for the propagation of exceptions; these would have been 
standard. 

It was, however, discussed to some extent in m- 
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Java is probably unique, in that resolution happens at runtime, but has both a 
static part, i. e., calculation of offsets in terms of the statically determined offsets, 
and a dynamic part, i.e., application of these offsets to the different objects. 
Thus, the effect of resolution depends on the particular classes or interfaces 
mentioned in the corresponding signature, and on the particular object which 
appears as receiver in the particular expression. In Java implementations these 
two parts may take place at different times. In fact, the static part need only 
take place once and store the calculated offset, whereas the dynamic part has to 
be applied as often as the instruction is executed. For reasons of simplicity, we 
do not describe this in our model. 



Field Resolution. Field access has the form a[ti,t 2 ].f. The offset of that field 
is determined using ti, t 2 , P), and if found, be., if l^(f , ti, t 2 , P)=^, then 

it is used to calculate the address of that field, i.e., a+(f> (FldAccI). Thus, our 
model describes address calculations - it that sense it is at a lower-level than 
those in \v2mm\ . 

Note that the offset calculation i^(f,ti,t 2 , P) is in terms of the stored, 
static type ti, and not the actual, dynamic class of the object in a. This offset 
is then applied to the address a, which may, but need not contain an object 
of class til 3 This combination of static with dynamic information is safe, if 
applied to a verified expression, to well-formed prepared code, and a well-formed 
state. Namely, as we shall see, verification of an expression e[ti,t2].f guarantees 
that execution of e will return an object of class ti or a subclass; well-formed 
prepared code guarantees that object layout of a class conforms to object layout 
of a superclass, and well-formed states guarantee that all objects in the store are 
organized according to the object layout for their class. 

The rules FldAcc2,FldAcc3 describe the erroneous situations: If ti is de- 
fined, but does not have a field f of type t2, i.e., .^(f, ti,t2, P)=— 1, or if ti 
is an interface, i.e., .^(f, ti, t2, P)=— 2, then exceptions are thrown. The case 
where .%(f, ti, t2, P) =— 3 need not be treated here, as it corresponds to the 
case where ti has not been prepared yet (c./. def. 0 and fig. 13.61) . which is treated 
by the rules for loading, verification and preparation, ie LoaddErr, VerifErr, 
LoadPrepVerif. 



Instance Method Call Resolntion. These calls have the form o;[ti,t2,t3]. 
m(/3). The offset is determined using (m, ti, t2, ta, P), which considers m, 
the name of the method, ti, the class containing the method, t2, the type of 
the argument, and ta, the result. The latter two are necessary for overloading 
resolution. 

As for fields, the actual class of the receiver, i.e., the class of a, is not consid- 
ered in A%(m,ti,t2,ta, P). If a method is found, i.e., if A%(m, ti, t2, ta, P)=(p 
for some 4 >, then (p is used to select the method body from the lookup table of the 
class of a through A 4 e{ 4 >,< 7 {a), P) in MethCallI - here the actual class of the 
receiver is used. This combination of static with dynamic information is safe, if 

This is why the configuration 63, cro, Pabc, Lg leads to the unsafe configuration de- 
scribed earlier: namely .%(fi, B,int, Pabc) = 3. 



An Abstract Model of Java Dynamic Linking and Loading 



69 



applied to a verified expression, and to well- formed prepared code. Verification 
of e[ti,t2,t3].m(e')guarantees that execution of e will return an object of class 
ti or a subclass, well-formed prepared code guarantees that the method lookup 
table of a class is a prefix of the method lookup table of any subclass. 

The erroneous situations are described by MethCall2 and MethCallS. If 
ti is an interface, then (m, ti, t2, ts, P)= —2, and the exception CIssChngErr 
is throwrF^ If class ti exists, but no such method can be found in ti, the ex- 
ception NoMethErr is thrown. The case where ti has not been prepared yet, 
i.e., A%(m, ti, t2, ta, P)= —3, is taken care of by the loading, verification and 
preparation rules. 



Interface Method Call Resolution. These calls have the form a[ti,t 2 ,t 3 ]L 
m(/3). The method is first looked up in the interface through Atjj (m, ti, t 2 , ta, P). 
If ti is a clas 0 or if the class of the receiver, denoted by a{a), does not im- 
plement tH then the exception CIssChngErr is thrown. If interface ti exists, 
but does not contain nor inherit an appropriate method declaratioii 0 , then the 
exception NoMethErr is thrown. Otherwise, the interface method call proceeds 
as an instance method call (IntfMethCallI). 

If we compare instance method calls and interface method calls, we notice 
that the latter require an extra check, which ascertains that the receiver im- 
plements ti. Such a check is not necessary for method calls, ei[t2,t2,t3].m(e2), 
because verification guarantees that e'^ will evaluate to an object of a subtype of 
t']^. However, the verifier is more lenient with interface method calls, and verifi- 
cation of ei[ti,t2,t3]'.m(e2) does not guarantee that ei will evaluate to an object 
of a subtype of ti; therefore this needs to be checked at the time of execution of 
the method call. 

The case where ti has not been prepared yet, i.e., where ti, t2, ts, P) 

=— 3 , is taken care of by the loading, verification and preparation rules. 



3.4 Loading 

Loading is required when a type context, is executed for a type t which 

has not been loaded yet. That is, when a new object of class t is created, or a 
when a field of class t is accessed, or when a method from class or interface t is 
called. 

This can happen, if one compiles ti as a class, then compiles the class containing 
the method call, then recompiles ti as an interface, without further recompilations. 
This can happen, if one compiles ti as an interface, then compiles the class containing 
the method call, then recompiles ti as a class, without further recompilations. 

This can happen, if one compiles a class c' which is a superclass of o[a) and 
which implements the interface ti, then compiles the class containing the method 
call, then recompiles making sure that none of the superclasses of cr(a) implement 
the interface ti, without further recompilations. 

This can happen, if one compiles ti with the method declaration, then compiles 
the method call, then removes from ti the method declaration, and recompiles ti, 
without further recompilations. 
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If loading is successful, i.e., ld{t, P, L) = L' ^ Lg, then execution continues 
with the loaded code augmented by L' (Load), otherwise an exception is thrown 
(LOADERR).Our operational semantics is non-deterministic with respect to load- 
ing: it allows a load exception to be thrown in all type-contexts, without even 
attempting to load the types. This simplifies our system considerably, and does 
not diminish the applicability of the soundness property. 

A loader function ld{t, P, L) returns class or interface definitions for t and 
all its superclasses and superinterfaces except for those already defined in P 
or L, provided that no class or interface circularity was encountered; otherwise 
it returns Lg. Any function satisfying these requirements is a loader. A “real” 
loader would lookup type definitions in the filesystem or a database; these can 
be modified from outside the Java program, and so different calls of the loader 
for the same type can return different bytecode. In order to simplify the model, 
rather than providing a filesystem/database parameter, we allow for different 
loader functions to be called, thus obtaining the same effect. 

Definition 4 A function Id : \d x V x C ^ £ is a loader iff: 

W(t,P,L) = L'^Lg ^ 

- teTs(L')\rs(PL). 

- h P,L O, ^ h P,LL' O,. 



3.5 Verification 

Verification is required when executing a type context, IZtD*^?', and t has been 
loaded but not yet prepared, i.e., t G Ts(Li) \ Ts(P). The loaded code consists 
of Li and L 2 , where Li contains the definition of t and its supertypes, except for 
those already defined in P, i.e., h P,Li<>si,ps. Then Li is verified. If verifica- 
tion succeeds and requires the loading of L', then Li is prepared, and execution 
continues with the augmented prepared code Pi, and additional loaded code L', 
c.f VerifAndPrep. If verification fails, an exception is thrown, c.f VerifErr. 
As for loading, our operational semantics is highly nondeterministic with respect 
to verification: it allows a verification error to be thrown in all contexts which 
require verification, without requiring the verification to have been attempted 
and failed. This allows for a simpler model, and simpler proofs, and does not 
diminish the applicability of the soundness property. 

Verification in our paper is described in fig. Q It corresponds to the third 
pass of the “real” verifier as in ch. 4.9.1 of EH, and is expressed through the 
judgment 



P, L L" O - L' 

meaning that the binary L" could be verified in the context of the prepared code 
P, and the loaded but not yet prepared code L, and caused L' to be loaded (but 
not verified). Thus, this judgment has the “side-effect” of loading L'. 

Verification of classes is defined in terms of verification of expressions, with 
the judgment 
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P, L, E e : t 



L' 



meaning that the expression e could be verified as having type t, in the context 
of P, L, and the environment E, and caused L' to be loaded (but not verified). 
Establishing the above sometimes requires a judgment 



P, L b, t < t' 



L' 



meaning that type t could be verified as widening to type t' in the context of 
P and L, and caused further classes/interfaces L' to be loaded (but not veri- 
fied). Classes or interfaces may be loaded when trying to establish whether a t 
undefined in P or L is a subtype of t', as in rules (5) and (6) of fig. 0 

For example, verification of 

Cbake = new Pear[Food, Spice, void].bake( new Spice) 
requires establishing that Pear widens to Food, which, in its turn, if Pear is not 
loaded, requires loading Pear and all its superclasses. Therefore, if 
/d( Pear, P0 , L0 ) — LpearLpoodi 
and the superclass of Pear is Food, then: 

P0,L0b)Pear < Food LpearLpood- 

The difference between (5) and (6) is, that in (5) class c and all its superclasses 
are loaded, whereas in (6) only interface i and its superinterfaces are loaded. 

The assertion P,L h„ t < t L0 holds for any t, c.f. rule (1). Thus, 
verification assumes any identifier to stand for a class, or interface and so to 
widen to itself. Therefore, 

P0,L0 bi Spice < Spice L0. 

Also, the assertion P, L b; t < i L0 holds for any interface i, c.f. rules 
(3) and (6). Thus verification assumes any identifier to widen to i, provided that 
i stands for an already loaded or prepared interface. 

Verification is “optimistic” with respect to method calls and field accesses 
(rules (10) and (11)), and more liberal than the Java source checks. For field 
access, ei[ti,t 2 ].f, verification only checks that the type of ei widens to ti, the 
static type in the signature, and gives to the whole expression the type t 2 - it 
does not attempt to check the existence of a field with type t 2 , but leaves this 
to the resolution checks. Similarly for method calls. Therefore, verification of 
Shake will load Food and Pear, and not Spice, and will not verify either of these 
classes, i.e., 

^0: ^0 ; S b; Shake ■ VOld ^Pearkpood 

Verification of a class (rule (13)) does not imply verification of all classes 
used: If Lcook contained a unique method 

void boil( Spoon x){ x = new Spoon; ehake } 
then, even though the classes Pear, Food, Spice and Spoon are mentioned, the 
verification of Lcook only requires class Pear and all its superclasses to be loaded. 
Thus, 

P0) L 0 b; Lcook ^ ibiSs kpoodkpear- 

Finally, if an order can be found to verify classes and/or interfaces t,, then 
verification is successful, c.f. rule (15). Note, that judgment P, L h c <ciss c 
means that c is the name of a class, whereas P, L h i <mtf i means that i is 
the name of an interface. 
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( 1 ) 



t 6 Id 



— loads 



P, L h C <cl,s c' 



( 2 ) 



t € Id 
P, L I- I <,„!/ I 
P. L t < i ,tr. 



(3) 



< L0 



P, L hf, int < int 



(4) 



, L0 



(5) 



T(c, PL) = e 
W(c, P, L) = L' 

P, LL' h c <ci ^5 c' 



P, L hs c < c' 



, L' 



T(PL,i) = 6 
ld{\,P,\.) = L' 
P, LL' I- i <i„if 

P. L ti’ t < i 



( 6 ) 



(7) 



P, L, E h; /3 : int I 
P> L. E It' 0 : c L 0 

P, L, E new c : C f; 



E(y)=t 



P, L, E y 



(8) 



P,L,Ek'V : t -,L' 
P,LL',Efee ; t' I 
P,LL'L''h„t' < t ,ta. 
P. L, E L; V = e : t' 



(9) 



L'L"L"' 



P, L, E I,, ei : ti Li 
P,LLi,Eh„e2 : t', Li 
P,LLiLih)ti < ti t.,Li 
P.LLjLjLjbti < t2 ,ta.L ; 

P ^ ^ 1_ r* 4. 4.1 \ 



(10) 



loads *-4 

fri ei[ti,t2,t3].m(e2) : ta LiL^L^Li 



( 11 ) 



P, L, E l5 e : t ,t 
P,LL' hjt < ti 



.L" 



P,L,Eh,e[ti,t 2 ].f : ts 



L^L" 



n > 0 



T(C, L) = c', {il...in} 

P ■ L P C ^ c/s5 C 
P,LLij <,ntf ij Vjel...n 
Ms(c, L) = { til mi(ti 2 x){ei), . 
P, LL'i...Li(|_ j) , (ti 2 X, c this) hj ei 
P,LLi...Li,_iPt;i < til ,fe,.Lii 
P,LfecO ,- 3 .Li...Lk 



P, L, E ei : t) Lj 

n I I 7 ir L . 



, ' '-1 , 
I2 f^ds L2 



( 12 ) 



P, LL'i,EPe2 

P.LLjLjhtj < t 2 

P, L, E p, ei[ti,t 2 ,t 3 ]'.m(e 2 ) : ta L(LiLi 

(13) 



tki mk(tk 2 x){ek} } 



til iTitis L2i — 1 



ViGl...k 

ViGl...k 



r(i,L) = {ii...i„} 
P; L P ij ^intf ij 



(14) 
n > 0 
VjGl...n 



(15) 

P P, L 0<, 

{tl,...tn} = Ts(L') 

P, LL'i . . . Lf_i p ti O ,tT 2 . L; ViGl...n 
PiLfcL'O -.Li...L'„ 



Fig. 7. Verification 
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Env ::= e — Env, t z — Env, t this 



E(z) 7^ e ^ E'(z) = E(z) 

h E' < E 



Fig. 8. Environments 



Verification requires type assignments, expressed through environments, E. 
Environments are sequences of declarations of the form ti vari; they are described 
in fig. El They should contain unique declarations, as expressed by the judgment 
h E 0„, and allow looking up the type of variable z through E(z) Ej We do 
not require the t, to indicate types declared in P or L. So, an environment may 
use identifiers as types which have no corresponding definition in P or L. 



3.6 Preparation 

If verification is successful, code is prepared using the function pr : V x C — > 
V, which maps L to pr(L, P). Preparation determines the object layout (i.e., as- 
signs offsets to fields), and creates method lookup tables {i.e., assigns offsets to 
methods, and method bodies to offsets). 

Rather than prescribe the exact strategy for offset determination, we give 
requirements in definition El i.e., a mapping is a preparation function if it 
maps all types from L onto corresponding types in P' with same superclasses 
and superinterfaces, ttZn.l allocates distinct offsets to fields, (|23 preserves field 
offsets from superclasses |23) preserves method offsets from superclasses, IfidB all 
valid offsets lead to a method body either defined for that class in L, or inherited 
from a superclass. 

Definition 5 A function pr : C x V ^ V is a preparation function iff: 
h P, L Oa, pr(L, P) = P' ^ 

LVt: T(t,L) = T(t,P'). 

r(c,L) =c', {ii,...in}, te Ts(L) ^ 

V f,f',t,t',ti,t 2 , m : 

a; .%(f,c,t,P') = .%(f',c,t',P') >0 ^ f = f',t = t'. 

^(f,c',t,PP') >0 ^ %(f,c',t,PP')=.%(f,c,t,P'). 

c) s 0 ^ 

A%(m,c',t 2 ,ti, PP') = A%(m,c,t 2 ,ti, P'). 

d) Mff{m,c,t2,ti,P') = (j) ^ 

7We(()),c, P') = 7W(m,c,t2,ti, L) e, or A%(m, c', t 2 , ti, PP') = 

In El we gave a constructive definition of such a preparation function. In general, 
many different results may come from a preparation function, because there may 
be many different offset allocation strategies. 
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We do not define h E Ou, nor E(z) , because they are standard. 
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(1) (4) 

t € Id 

P, Lh t < t P, Lh int < int 



(2,5) 

P, LI- c <cUs c' 
P, L h c < c' 



(7) 

( 8 ) 

P, L, Eh/3 : int E(y) = t 

P, L, EhO ; c P,L,Ehy : t 

P, L. E h new c : c 



P, L, Eh V : t 
P, L, E h e : t' 

P, L h t' < t 
P, L, E h V = e : t' 



P,L,Ehei : t'l 
P,L,Ehe2 : 

P.Lht'i < ti 

P, L h t(, < t2 

P, L, E h ei[ti,t2,t3].m(e2) 



(10) 



: t3 



( 11 ) 

P, L, Ehe : t 
P, Lht < ti 
P,L,Ehe[ti,t2].f : t2 



P, L, E h ei : ti 
P,L,Ehe2 : ti 
P, L h tj < t2 
P, L, E h ei[ti,t2,t3]'.m(e2) 



(12) 



: t3 



T(c, P) = c', {ii, ...in} n>0 

P, L h c' <cl,s c' 



P, L h ij <i„if ij Vjel...n 

J^(f,c',t, P) > 0 =h .^(f,c',t, P) =.%(f, c,t,P) Vf,t 
.:^j{f,c,t,P) = .%(f',c,t',P) > 0 =h f = f',t = t' Vf,f',t,t' 
yVljf (m. c', t, t', P) > 0 =h c, t, t', P) = c', t, t', 

,'V(jf(m,c,t, t', P) = i)> =>■ 

Ale(0, C, P) = e, P, L, (t X, c this) h e : ti, P,Lhti < 
P, L h cO 



(14) 

T(i,P) = {ii,...in} n>0 

P, L h ij <,„tf ij Vj€l...n 

P, L h i O 



(15) 

h P, L On 

Vt6Ts{P): P.LhtO 

Lh PO 



t 6 Id (3, 6) 
P, L h i <,„(/ i 

P,L h t < i 



(13) 

Vm, t, t' 

Vm,t, t' 



Fig. 9. Well-formed prepared code - rule numbering consistent with that for verifica- 
tion 
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The code Pbc from section |21 is the result of the application of a preparation 
function on Lbc- 

Interestingly, definition |3 does not pose any requirements on interfaces. It 
does not require inheritance of methods from superinterfaces {i.e., that T(i, 

= {...i'...} and i', t 2 , ta, PP') = 0 implies that Ad^(m, i, t 2 , ta, P') = 

0), even though this is a property of Java implementations, and even though it 
is an integral part of soundness of the Java source language, it is not required 
for soundness at this level (namely, at the level of £ or P expressions, the types 
are largely determined by the signatures). This reflects how weak the notion of 
interfaces is. Note also, that fields were not reflected in £ code, but they are in 
V code. 

4 Soundness 

4.1 Well Formed Prepared Code 

The judgment L h P O , defined in fig. 1 . 1 . (it guarantees that the prepared code 
P is well formed in the context of loaded code L. Well-formedness is a simi- 
lar requirement to verification, in the sense that the types of expressions are 
checked, and subtype relationships implied through the type annotations need 
to be established. For this reason we organized fig. 1^ in a similar way to fig. Q 
As in verification, well-formedness of prepared code does not guarantee the 
existence of fields or methods required in method bodies. In contrast to verifica- 
tion, well-formedness of prepared code does not cause loading of further binaries. 
Also, while judgment P, L hj, L' O L" represents checks that are performed 
by Java implementations, the judgment L h P O is only a vehicle for proving 
soundness. 

The main requirements for well-formedness of prepared code are: 

all classes/interfaces defined in P have their superclasses/superinterfaces 

in P, 

identifiers mentioned as superclasses belong to classes, 
identifiers mentioned as superinterfaces belong to interfaces, 
fields have distinct offsets, 

fields defined in a superclass d have the same offsets in a subclass c, 
methods defined in a class d have the same offsets in a subclass c, 
method bodies are well-formed and respect their signatures. 

As for preparation, the requirements posed on interfaces are very weak. 



4.2 Conformance and Runtime Types 

The judgments cr, P hj /3 O, and P, E ly, cr O, defined in rules (l)-(5) of fig. E3 
express conformance of values to types, and of states to programs and environ- 
ments. 

The judgment tr, P ly a O in rule (4) expresses that the object stored at 
a conforms to its class. The class of the object, c, is stored at the beginning 
of the object. For all fields of c, the object must contain appropriate values at 
the corresponding offsets. In order to obtain a well-founded relation, we defined 
conformance in terms of the auxiliary weak conformance judgment cr, P /3 : t. 
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( 1 ) 



cr, P hsj f3 : int 



ct(q;) = c' € Id 

^5 ^0 ^ ^ ^clss 

a,P h[i, a : c 
CT, P 0 : c 



(2) 



( 3 ) 

cr{a) = c 

^0 ^ ^ ^clss C 

5 ^0 ^ i ^intf i 

CT, P a : i 



cr(a) = c G Id 

5 ^0 ^ ^ ^clss C 

Vf,t: ^(f,c,t,P) = <^ 

cr,P \-a a O 



( 4 ) 



(j, P hi, cr(a + <;/)): t, and \f(j>' < 4>: cr{a + <j)') ^ Id 



( 5 ) 



cr(a) G Id ^ 


(T, P fi a O 




E(z) / e 


CT, P fi, ct(z) : E(z) 




P, E hg (T O 








( 6 ) 


(7) 


P,Eh; a O 




P,EfiaO ( 8 ) 


P,L,Et a, 13 : 


int 


cr(a) = c P, E fi CT O 


P, L, E hi (T, 0 : 


c 


P, L 0 f- c <ciss c E(y) = t 


P, L, E hi (T, new c : c 


P,L,EfiCT,a: c P,L,Efi(j,y: t 






( 10 ) 




(9) 


P, L, E fi a, ei : ti 


P, L, E hi (T, V : 


t 


P, L, E fi a, 62 : ti 


P, L, E fi (T, e : 


t' 


P,Lhti < ti 


P, L h t' < t 




P,Lhti < t 2 


P, L, E fi (T, V = 


= e : t' 


P,L,E fi cr,ei[ti,t2,t3].m(e2) : ts 






( 12 ) 




( 11 ) 


P, L, E fi a, ei : ti 


P, L, E fi (T,e : 


t 


P, L, E fi a, 62 : ti 


P, Lf- t < ti 




P,Lhti < t2 


P, L, E fi (T,e[ti,t 2 ].f : t 2 


P,L,E fi o-,ei[ti,t2,t3]‘.m(e2) : t3 



Fig. 10. Conformance, and types of runtime expressions 
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Notice, that a positive value (3 may conform to both int and a class type, and 
to any interface type. For example, cro, Pabc Ku 5 : int, and (Tq, Pabc Im 5 : B, 
but (Toj Pabc b™ 5 : A. Also, if Pintfi contains the declaration of an interface 
Intfl, then cto, PintfiPABC h, 5 : Intfl, and cto, PintfiPABC 6 : Intfl. Finally, the 
requirement V^' < (j): a{a + (j)') ^ Id ensures that no object is stored “inside” 
another object, and is used to prove that execution does not affect the type of 
expressions (lemma 4) . 

The judgment P,E he a O, defined in rule (5), expresses that the store cr 
conforms to prepared program P and to variable declarations in E, and requires 
that: 

all classes/interfaces defined in P have their superclasses/superinterfaces 

in P, 

the classes of all objects stored in cr are defined in P, 

all objects stored in a conform to their class, 

all variables defined in E have in cr values appropriate to their types. 

Notice, that store conformance does not take the loaded, not yet verified 
code L into account. This can be seen from the form of the judgments. Also, 0 
conforms to any class, allowing objects with a field initialized to 0, belonging to 
a yet undefined class. 

Types for runtime expressions are given by the judgment P, L, E ly. cr, e : t, 
defined in rules (6)-(12) in fig. [nil The rules are similar to well-formedness, with 
the difference that for runtime expressions the store cr is taken into account. 



4.3 Locality and Preservation of Judgments 

In general, one expects properties established in a certain context to hold for 
larger contexts as well. Locality properties were proven in 0, used in 0, and 
explored in our model of binary compatibility j^. 

We prove that judgments in the context of P and L1L2 are preserved, if loaded 
code is replaced by prepared code which has the same subclass/subinterface 
information {i.e., replace Li by Pi, where T(t", Pi) = T(t", Li) for all t"), the 
loaded code is augmented by P3, and the environment E is extended to E': 

Lemma 2 For all P, Pi, L, Li, Lj, L3, e, t, t', a, E, E', if T(t", Pi) = T(t", Li) 
for all t" , and h E' < E then: 

- T(t, P, e => T(t, P, L1L2) = T(t, PPi, L2L3). 

- P,LiL2ht < t' ^ PPi,L2L3ht < t'. 

- P,LiL2,EFe : t ^ PPi, L2L3, E' h e : t. 

- P, L1L2, E [y o-,e : t PPi, L2L3, E' [y o-,e : t. 



Verification of classes implies verification of the bodies of their methods: 



Lemma 3 For any P, L, L', L", c, if 

P,LfyL"0 L', and A4(m,c,ti,t2, L") = e, 

then, there exist t[, L^, L2, L3 and L4 such that E' =E'iE'2E'^E\, and 



P, LLj, (t 2 X, c this) [y e : t'l 



_ L' 

loads 2 ’ 



and P, LL'iL '2 [y t'l < ti 1-3- 



Preparation of verified code preserves judgments: 
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Lemma 4 For any P, Pi, Li, L2, L3, e, t, E, a, where P,LiL2 hj Li O L3, 
if Pi =pr{Li,P), then : 

- P,LiL2ht < t' PPi,L2L3ht < t'. 

- P,LiL2,Ehe : t ^ PPi, L2L3, E h e : t. 

-LiL2hPO ^ L2L3I-PP1O. 

— P, E [7; cr O => PPi) E I7; CT O. 

— P,LiL2,E[t. tj,e : t ^ PPi, L2L3, E I7; a,e : t. 

4.4 Subject Reduction and Progress 

Execution of a well-typed expression e does not overwrite objects, creates new 
objects in the free space, and does not affect the type of any expression e" - even 
if e" were a subexpression of e! Such a property is required for type soundness 
in imperative object oriented languages, and was proven, e.g., , in [KI2t)j . In the 
current work this property holds only when well-typed expressions are executed. 

Lemma 5 For P, L, E, a, non-ground e, t, cGld, if 

- L h P O , and 

- P, L, E b cr, e : t, and 

- e,a,P,L e',cr',P',L', 
then 

- =c => (ot) =c, 

- a' (a) = c => a(a) =c or a new in a, 

- P, E b <7 O => P, E b 

- P,L,E b cr,e" : t" P', L', E b <r', e" : t". 

Proof by structural induction over the derivation and for the fourth part 
of the lemma, in the cases of VarAss or FldAccI by structural induction over 
the typing of e", using the store conformance requirement whereby no object is 
stored within another object. 

Lemma 6 (Progress) For any P, L, E, cr, t, non-ground e, if e does not con- 
tain an exception, then there exist P' , L', u' , e', such that e, cr, P, L 
e',a',P',L'. 

Theorem 1 (Subject reduction) For any P, L, t, E, e, e', a, if 

— L h P O, and 

— P, L, E b cr, e : t, and 

- e,a,P,L ^ e',a',P',L', 

then 

- L' h P' O, and 

• P', L', E' b cr',e' : t', and P',L' h t' < t, for appropriate E', t', 
and 

t=t' if e is a non-l-ground variable, 
or 

• e' contains and exception. 

Proof by structural induction over typing P, L, E b cr, e : t. 
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Thus, the new, possibly augmented, prepared code, P', preserves its well- 
formedness, and the store a' preserves conformance. Uninitialized parts of the 
store, where cr{a) = e, are never dereferenced. Finally, execution never gets 
stuck. 

5 Summary and Alternatives 

Verification of class c requires verification of all methods in c and all its (not 
yet prepared) superclasses. Verification of terms requires establishing subtype 
relations between types t and t'. If t has not been loaded yet, then it will be 
loaded with all its superclasses, except if t and t' are identical, or t' is an interface. 
Verification does not ensure the presence of fields or methods, it only ensures 
that all methods in a verified class respect their signatures. Resolution checks 
for the presence of fields and methods of given signatures. Thus the verifier relies 
on resolution to detect some of the possible errors, and resolution is safe only on 
code previously checked by the verifier. 

The system does not guard against link-time errors {i.e., LoadErr, or VerifErr, 
or NoMethErr, or NoFIdErr, or CIssChngErr), but it does guarantee the integrity 
of the store. On the other hand, execution of unverified code may overwrite any 
part of the memory, and execute any methods. 

Our model is independent of Java reflection: We represented prepared and 
loaded code as separate entities of the configuration, rather than as objects of 
class Class in the store a. This abstraction from ’’real” implementations allows 
us to demonstrate in the format of the judgments how the various components 
depend on each other. Namely: 

— .%(f,c,t, P), A%(m, c, t2, ti, P), and Ad^(m, i,t2,ti, P) show that offsets 
are looked up in the prepared code only, and the operational semantics rules 
show that they depend on the types t2 and ti stored in signatures, and not 
the runtime types of the objects. 

— e,(j, P, L1L2 PPi, L2L3 shows which components may be affected 

by execution. 

— P,L h„ L' O L" shows that verification takes the prepared and the 

loaded code into account, does not take the store into account, and that 
may load further code. 

— P, L h P' O shows that well-formedness of prepared code takes the prepared 
and the loaded code into account, but does not load further code. 

— P,E ly, cr O shows that conformance of a store depends on the prepared 
code and not on the loaded code; in particular, any objects in a must belong 
to prepared classes. 

— P, L, E hr cr, e : t shows that types of runtime expressions depend on 

the prepared code, but also on the store and on the loaded code (the latter 
because of the arguments to method calls) . 

— pr(L, P) shows that preparation depends on the code already prepared, and 
on the loaded code to be prepared, but does not depend on the remaining 
loaded code. 

— The role of the loaded code L in checking is limited; the only information 
extracted from L is which class/interface extends/implements which other 
class/interface, but the contents of the classes/interfaces is ignored. 
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Link-time errors can occur also when running code that was produced 
by a compiler, as shown in the various footnotes. However, link-time errors 
will not occur, if one re-complies all importing classes/interfaces and all su- 
classes/subinterfaces after recompiling a class or interface - we have not demon- 
strated this yet. 

It is interesting that interfaces are treated by verification more leniently than 
classes, and thus require more runtime checks. It would have been possible to 
treat classes as leniently, or to treat interfaces more strictly. 

In current implementations the boundary of decomposition is at classes or 
interfaces. That is, we load several classes or interfaces together, and we verify 
several classes or interfaces together. Is it possible to consider other levels of 
decomposition? A probably less attractive, more lazy alternative would put the 
boundary of decomposition at methods, and would verify method bodies only 
before they are first called. This would make the judgment L h P O even weaker, 
and would extend the operational semantics to verify method bodies on a per 
call basis, and check for previous verification. 

Another lazy alternative, as suggested in in |1 1 H,‘I] and formalized in I23I, 
instead of immediately establishing that t is a subtype of t' would post a con- 
straint requiring t to be a subtype of t', to be validated only when t is loaded. 
This would treat L’s as constraints, and the judgment P, L, b, e : t L' to 
mean that the verifier established e to have type t, while posting L'. 

It is easy to modify our model to express the above alternatives. More chal- 
lenging would be a unified framework that would allow to characterize all such 
alternatives. 

6 Conclusions, Discussion, and Further Work 

We have given a model for the five execution components, and have demonstrated 
how the corresponding checks together ensure type soundness. Our model de- 
scribes is at a high level, and distinguishes the components and the time of the 
associated checks. Thus, our account is useful for source language programmers, 
designers of new binary formats for Java, and designers of alternative distribu- 
tions of the checks among the four components. The format of the judgments 
reflects the dependencies of the components. We do not yet treat multiple load- 
ers. 

Formal treatments of linking were suggested in |^, albeit in a static setting. 
Dynamic linking at a fundamental level has been studied in IHGEEI, allowing 
for modules as first class values, usually untyped, concentrating on confluence 
and optimization issues. Recently, discuss dynamic linking of native code 
as an extension of Typed Assembly Language without expanding the trusted 
computing base, while |H| takes a higher-level view and suggests extensions of 
Typed Assembly Language to support type safe dynamic linking of modules and 
sharing. The above works are based on structural type equivalence, higher order 
types, and linking as a one-phase transformation which binds free references; 
Java however, has name type equivalence, first order types, and its resolution is 
a multiple phase activity. 

Recent work on Java linking P3T1T1 complements ours. They both uncov- 
ered errors in current verifiers, in that insufficient constraints were posted for 
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the arguments of inherited methods, or the arguments of a class implementing 
an interface. suggest a model of Java evaluation, preparation, verification 
and loading at the bytecode level, without interfaces, but with multiple load- 
ers. Their approach is lazier than that of SUN implementations, and verification 
posts constraints as opposed to loading classes. m provide a model of Java 
evaluation, preparation, verification and multiple loaders, describing interfaces, 
but only treating method calls. Both approaches adopt a higher level runtime 
model than ours, and thus do not demonstrate how unverified code can destroy 
the consistency of the store. Furthermore, the above works consider a couple 
of bytecode instructions, do not describe complete classes or interfaces, do not 
distinguish crlearly between loading and verification. 

The current paper is an improvement over the work presented at TIC 0 . The 
adoption of non-deterministic operational semantics, and the use of the look-up 
functions T, AJjy, Jjj, Me, Mg as opposed to complete program code, 
allowed a more concise, abstract account. 

Further work includes refining the model to allow multiple class loaders, 
extending the model to describe the source language and the compilation process, 
extending languages C and V with more Java features, considering different 
levels of decomposition, applying the model to reconsider the meaning of binary 
compatibility Q. 

Finally, though Java is novel in its approach to verification and dynamic link- 
ing, similar components and associated checks could be defined for any language 
that supports some concept of modularity. The generalization of such ideas to 
other programming languages is an open issue. 

Acknowledgments. I am deeply indebted to the TIC referees for extensive and 
very useful feedback. One of the referees, in particular, provided many insightful 
remarks and valuable suggestions that have improved this work considerably. 
Earlier versions of this paper have benefited from input from David Wragg, 
Tatyana Valkevych, Susan Eisenbach, Mark Skipper, Elena Zucca, and Eugenio 
Moggi. 
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An Example Demonstrating Interfaces 

The following example demonstrates the verifier’s and run-time system’s treat- 
ment of interfaces. It is an adaptation of the example which was posted by Martin 
Buechi | 2 | in the types mailing list, and was then discussed at some length. 

We start with an interface Thinker implemented by class Man, and the class 
Main with method main: 



interface Thinker { void be(); } 

class Man impl Thinker { 

void be(){ System. out. println(”be”) ; } 

} 



class Main { 

public static void main (String args[] ) } 

Thinker descartes; 

Man John = new Man(); 

System. out. println(”a Man object created”); 
if ( John instanceof Thinker) 

System. out. println(”john is aThinker”); 
else 

System. out. println(”John is NOT a Thinker”); 
descartes = new Man(); 

System. out. println(”a Man assigned to a Thinker”) ; 
john.be(); 

} 

We compile Thinker, Man and Main, and we then modify class Man, so that it 
does not implement Thinker, i.e., 

class Man { void be(){ System. out. println(”be”) ; } } 

We compile Man, without re-compiling Main. When we execute Main, we 
obtain the output: 

a Man object created 
John is NOT a Thinker 
a Man assigned to a Thinker 
IncompatibleClassChangeError : 



class Man does not implement Thinker 
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The above behavior is described by our model, namely: 

— Verification of method main considers the assignment descartes = new 
Man(); as type correct, because the verifier is ’’liberal” with respect to 
interfaces 

— Verification of the interface method call john.be() requires loading of the 
interface Thinker. 

— Verification of method main does not need to load class Man. 

— The assignment descartes = new Man(); is executed without any checks, 
and therefore without errors. 

— The interface method call john.be() is compiled to a bytecode term 
which corresponds to john[Thinker,void,void]b(). Execution of that term 
requires a run-time check according to rule IntfMethCallS. This 
check fails, and gives the error message IncompatibleClassChangeError : 
class Man does not implement Thinker. 
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Abstract. There is a growing need to provide low-overhead software- 
based protection mechanisms to protect against malicious or untrusted 
code. Type-based approaches such as proof-carrying code and typed as- 
sembly language provide this protection by relying on untrusted compil- 
ers to certify the safety properties of machine language programs. Typed 
Module Assembly Language (TMAL) is an extension of typed assem- 
bly language with support for the type-safe manipulation of dynamically 
linked libraries. A particularly important aspect of TMAL is its support 
for shared libraries. 



1 Introduction 

Protection of programs from other programs is an old and venerable problem, 
given new urgency with the growing use of applets, plug-ins, shareware software 
programs and ActiveX controls (and just plain buggy commercial code). His- 
torically the conventional approach to providing this protection has been based 
on hardware support for isolating the address spaces of different running pro- 
grams, from each other and from the operating system. The OS kernel and its 
data structures sit in a protected region of memory, and machine instructions 
are provided to “trap” into the kernel in a safe way to execute kernel code. 

While this approach to protection is widely popularized by operating systems 
such as Windows 2000 and Linux, there is a growing desire to find alternatives. 
The problem is that this technique is a fairly heavyweight mechanism for pro- 
viding protection, relying on expensive context switching between modes and 
between address spaces. Although application designers have learned to pro- 
gram around this expensive context switching (for example, buffering I/O in 
application space), this approach breaks down very quickly in software systems 
composed of separately authored subsystems that do not place much trust in 
each other, and where context switches may occur much more frequently than 
in an OS/application scenario PH|- 

In the OS research community, investigation of alternatives has been mo- 
tivated by the demands of modular micro- kernel operating systems, where OS 
modules outside the kernel might not be trusted. Software fault isolation (where 
the loader inserts software sandboxing checks into machine code |2S|) and the 
SPIN project (where type-safe OS modules are compiled by a trusted compiler 
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0) are examples of approaches to providing protection in software rather than 
hardware. Sandboxing in Java VMs has also been motivated by the expense of 
hardware-based memory protection for applets m- The commercial world is see- 
ing an explosion in the use of component technology, exemplified by Java Beans 
and ActiveX controls. This use of component technology again motivates the 
need to find some relatively lightweight software-based approach to protection 
in running programs. 

Proof-carrying code and typed assembly language are ap- 

proaches to providing this protection at low run-time cost. These approaches are 
examples of self- certifying code. A compiler produces a certificate that a program 
satisfies some property of interest, for example, that the program is well-typed. 
The user of a compiled program can check that the certificate supplied with a 
program is valid for that program. If the check succeeds, the program can be run 
without run-time checks for the safety properties verified by the certificate. This 
approach has the advantage of moving the compiler out of the trusted computing 
base, while reducing the need for run-time checks in the code. 

Typed assembly language (TAL) enforces a type discipline at the assembly 
language level, ensuring that malicious or carelessly written components cannot 
use “tricks” such as buffer overflows or pointer arithmetic to corrupt the data 
structures in a running program. Unlike the typed machine language underlying 
the JVM, TAL is not tied to a particular language’s type system or interpreter 
architecture. Instead the type system is a moderately generic high-level type 
system with procedures, records and parametric polymorphism, while the target 
assembly language is any modern RISC or CISC assembly language. The type 
system is designed to be rich enough that it can serve as a target for compilers 
for many different languages, while at the same time having as much freedom as 
possible in its choice of code optimizations, parameter-passing conventions, and 
data and environment representations um. 

Given the importance of component technology as a motivating factor for 
TAL, it is clear that there should be support for manipulating components in 
a type-safe but flexible manner. Modular Typed Assembly Language (MTAL) 
extends TAL to typed object files and type-safe linking m However this is 
limited by the assumption that all of a program is linked together before the 
program is run, with linking happening outside of the program itself. Dynamic 
linking may be used to avoid loading an entire library when only a small part of 
the library will be needed. For example, the Linux kernel uses dynamic linking to 
load in kernel modules on an as-needed basis. While static linkers do a good job 
of only linking those parts of a library that a program references, they cannot 
predict in advance what of the referenced modules a program might actually use. 
Dynamic linking is also useful for shared libraries, allowing several processes to 
share a commonly used library in memory. Indeed one can consider the operating 
system itself as a shared library, one that is made available in a protected region 
of memory to all running programs. 

Our interest is in extending TAL with support for dynamic linking and shared 
libraries. Clew and Morrisett m consider some alternative approaches to ex- 
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tending MTAL with dynamic linking, but this consideration is only informal. 
One issue that they do not consider, which is central to our work, is what model 
dynamic linking should use for software components and for linking components. 

An obvious candidate is the ML module system m, which provides fairly 
sophisticated support for type-safe linking as a language construct E3EI- Indeed 
this is the philosophy underlying MTAL, which relies on a phase-splitting trans- 
formation to translate ML modules to TAL object files. However the problem 
with this approach is that it leads to two different models of linking: 

1. At the source language level, linking is based on applying parameterized 
modules. Higher-order parameterized modules may be useful for separate 
compilation |lbllhllHlij4IUj| . but there are still problems with supporting 
recursive modules |S| (as are found in Java and C). 

2. At the assembly language level, linking is based on a type-safe version of the 
Unix Id command. Circular imports present no problem at this level, but 
much of the sophistication of the type system for ML modules is lost. This is 
unfortunate, since there are many lessons to be learned from ML that could 
fruitfully be applied to develop rich linking operations for languages such as 
Java. 

This article describes Typed Module Assembly Language (TMAL), an ex- 
tension of TAL with run-time support for loading, linking and running modules. 
Work on dynamic linking has focussed on class loading in the Java virtual ma- 
chine m- Java has the problem of a weak MIL. On the other hand, ML has a 
powerful MIL but no support for dynamic linking. The current work was orig- 
inally motivated by the desire to bridge this gap. TMAL pursues a model of 
linking that is closer to the MTAL approach than the ML approach, because it 
is closer to the form of linking used by popular languages such as Java. TMAL 
enriches the MTAL approach in several ways, drawing lessons from the ML ex- 
perience, but also limiting the ML approach in some ways that are not limiting 
for Java applications, but do avoid problems with extending ML modules to 
support Java. 

We make the following contributions to TAL: 

1. We enrich TAL with coercive interface matching, which allows a module 
to be coerced to an expected type that makes some fields of the module 
“private.” This is present in for example the ML module system, but not in 
MTAL. On the other hand, ML does not provide the same linking primitives 
as MTAL. 

2. We enrich TAL with support for shared libraries. This is supported in the 
ML module language but not in MTAL. On the other hand, ML does not 

^ Glew and Morrisett refer to “dynamic linking” as the process of linking an executable 
with libraries when it is first invoked, while they refer to “dynamic loading” as the 
linking and loading of libraries at an arbitrary point dnring execntion. Our use of 
the generic term dynamic linking is meant in the latter sense. We provide separate 
operations for “loading” a modnle (reflecting it from the core language to the module 
language) and for “linking” (linking together two modules). 
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support recursive modules, which are present in MTAL and which complicate 
the definition of shared libraries. 

3. We extend TAL with primitives for type-safe dynamic linking. Our approach 
resolves some open problems with dynamic linking and abstract data types. 
In particular, because types exported by modules are named but opaque 
(wholly or partially), it is not possible for run-time type checking to discern 
the underlying representation type for an abstract type. 

TMAL arises out of work on a high-level module language, incorporating 
ideas from ML but with application to languages such as Java, including support 
for recursive DLLs and shared libraries It can be viewed as a demonstration 
of how the semantics of that module language can be incorporated into typed 
assembly language. A central aspect of this scheme is the proper treatment of 
shared libraries, an important issue that is addressed in the ML module lan- 
guage but not in more low-level typed module languages [7114) . A related issue 
is a phase distinction in module language^ between the link-time phase of a 
module and the run-time phase of a module. The link-time phase is characterized 
by the application of linking operations to combine a library with other libraries. 
The run-time phase is initiated by the execution of the initialization code for 
a library, during or after which the definitions in the library are available to a 
running client. In static linking the client is always another software component 
with which the library is linked. With dynamic linking, the client is the running 
program that loads and initializes the library. This issue is not often explicitly 
acknowledged in the literature. In TMAL it is recognized by an explicit initial- 
ization operation, dlopen, that provides the demarcation point between these 
two phases in the lifetime of a module. 

In Sect. El we give a brief review of TAL and MTAL. In Sect. Owe reconsider 
the approach used in MTAL to represent abstract types that are exported by 
typed object files, and in particular how type equality and type definitions are 
handled. In Sect.0we give an overview of TMAL. The next four sections describe 
the operations of TMAL in more detail. In Sect. E]we describe TMAL’s support 
for coercive interface matching. In Sect. El we describe how types and values 
can be dynamically obtained from a module in TMAL. In Sect. Q we describe 
how shared libraries can be constructed in TMAL. In Sect.|S|we describe how 
DLLs are loaded in a type-safe manner in TMAL. Finally Sect, ^jprovides our 
conclusions. 

For reasons of space, we are unable to provide a comprehensive discussion 
of the various issues in module type systems that motivate some of the design 
choices presented here. The reader desirous of more contextual discussion than 
that presented here, is invited to consult m The TMAL type system is based 
on the linking calculus itself intended as a compilation target language 

^ This should not be confused with the phase distinction between compile-time and 
run-time explicated by Harper et al [^. The phase distinction between link-time 
and run-time does not exist in the latter calculus, because it translates module-level 
linking (functor application) to core language operations (function application and 
generic instantiation). 
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for a more high-level module language described in m- The correctness 

properties of carry over to TMAL. There may be some interest in verifying 
a translation from A^™*^ to TMAL. 



2 Modular Typed Assembly Language 



In this section, we review Typed Assembly Language (TAL) and Modular Typed 
Assembly Language (MTAL). This review is largely based on descriptions in the 
literature |28ll0ll4j . The syntax of MTAL is given in Fig. ^ Typed Assembly 
Language can be explained as the result of carrying types in a high-level language 
through the compilation process, all the way to the final output of assembly 
language in the backend. Starting with a high-level language, say with procedures 



K € Kind : 


~ty 1 (Ai^Aa) 




A, B € Type Cons : 


~t int . . . ,tm]P 1 




j € Initialization Flag : 


:=0 1 1 




^ G Type Heap Interface : 


~ {ti : Ki, . . . ,tk '■ Kk} 




'P G Value Heap Interface : 


;= {*1 : Ai , . . . ,Xk ■■ Ak} 




r G Register File Type : 


:= {ri : Ai, ...,rk : Ak} 




A G Type Var Context : 


:= {ti •. Ki, . . . ,tk '■ Kk} 




h G Heap Value : 


~ code[ti, . . . ,tm]P-I (wi,.. 


■,Wk) 


r G Register Name : 


■- rO,rl,... 




ui G Word Value : 


:= n 1 a; 1 w[Ax,...,Ak] \ 




V G Small Value : 


■.= w \ r 




TH G Type Heap : 


:= {ti 1-^ Ai , . . . ,tk ^ Ak} 




VH G Value Heap : 


~ {xi 1-^ hi, . . . ,Xk hk} 




R G Register File : 


~ {ri 1 -^ wi, ... ,rk Wk} 




I G Instruction Sequence : 


;= A; . . . ; A 




i G Instruction : 


add Vi,r 2 ,v \ malloc r[A] 


jmp u 1 ... 


Int G Interface : 


~ i<P,E) 




O G Object File : 


;= [Inti ^ {TH, VH) : Inte] 




E G Executable : 


~ ( TH, VH, x) 




P G Program State : 


~ {TH, VH,R,1) 





Fig. 1. Syntax of MTAL 
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and records, programs are translated to an assembly language where procedures 
have been translated to code segments (with code for environment-handling) and 
records have been translated to heap blocks. Thus for example the procedure 
definition: 

int fact (int x) { 
int y = 1; 

while (x != 0) y = (x — ) * y; 
return y; 

} 

is translated to the code segment: 

fact : code [] {aO : int ,ra: V [] {vO : int}} . 
mov vO , 1 
jmp loop 

loop : code [] {aO ; int , vO : int ,ra: V[] {vO : int}} . 
bz a0,ra 
mul vO , aO , vO 
sub aO , aO , 1 
jmp loop 

The register ra is the continuation or return address register, pointing to the 
code to be executed upon return. The fact procedure expects an integer in 
the argument register aO, and returns to its caller with an integer in the value 
return register vO. We use MIPS gcc calling conventions to name the registers 
in examples. 

In general heap values h have the form: 

1. A code segment code[ti, ... ,tm]r. I , with register file type F = {ri : 
Ai, ... ,Tn : An}. This is a code segment parameterized over m type variables 
ti, . . . Am and expecting its n arguments in the argument registers ri, . . . , r„. 
The types of the values in the argument registers are specified in the register 
file type. I is the sequence of assembly instructions for the code sequence. 
This segment has the code type V[fi, . . . , tm]F. 

2. A heap block {wi,...,Wk) where the k values Wi,...,Wk are word val- 
ues. Such a heap block has a heap block type {A\^ , . . . ,Aj!‘), where each 
jh G {0, 1} indicates if the hth slot has been initialized. Note that the tuple 
type {A {^ , . . . , Aj!^ ) should not be confused with tuples of types; we do not 
therefore have tuple kinds, although they could be added straightforwardly. 

Parametric polymorphism is used in an essential way to abstract over the call 
stack in typing a procedure definition. For example the most general definition 
of fact is: 

fact: code [EnvT] {aO : int , sp : EnvT,ra:V[] {vO : int , sp :EnvT}} . ... 

where the sp register points to the environment of the calling procedure. The 
type parameter EnvT ensures that the continuation is passed the calling proce- 
dure’s environment upon return. 
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An operational semantics is specified using program states of the form 
( VH, R, I), where 

1. VH = {a; e- >■ /i} is a value heap, a mapping from labels to heap values h; 

2. A = {r I— >■ /i} is a register file, a mapping from register names to values; and 

3. / is a sequence of typed assembly instructions. 

Program states are typed using register file types F = {f : A} and heap types 
F = {x : A}, where the latter maps from labels to types. The heap contents are 
unordered and may contain circular references. 

Modular TAL (MTAL) extends these concepts to object files for independent 
compilation and type-safe linking. An untyped object file imports some values 
and exports some values, identified by labels pointing into the object file heap. A 
MTAL object file places types on the imported and exported labels. Furthermore, 
to support the exportation of abstract data types, an MTAL object file imports 
and exports types and type operators, identified by labels pointing into a type 
heap in the object file. An object file O in MTAL has the form 

^ {TH, VH) : {Fe.'Fe)] 

where <Pi and (!>e are type interfaces mapping labels to kinds, <F/ and Fe are 
value interfaces mapping labels to types, TH = {t i— >■ A} is a type heap mapping 
labels to type and type operator definitions and VH = {a; i— >■ /i} is a value heap 
mapping labels to initial values. <Pj and F: provide the interfaces for imported 
types and values, while T>e and Fe provide the interfaces for exported types and 
values. An interface is a pair Int = of type and value heap interfaces. 

There are three operations in the MTAL module language: 

1. Linking: 0\ link O 2 O combines the object files Oi and O 2 into the 
single object file O. Imports in Oi and O 2 may be resolved during linking. 
Interface checking ensures that resolved imports have the correct type. 

2. Executable formation: (0,x) ^ E identifies the label for executing the 
code of the object file. Type-checking ensures that this label is bound in the 
value heap, and that all imports have been resolved. 

3. Execution of an executable: E P produces a program state of the 

operational semantics from an executable. Program states are extended to 
include a type heap, and have the form {TH, VH,R,I). 

3 Type Heap Reconsidered 

Before giving a description of TMAL, it is useful to explain how our treatment 
of the type heap and type identity differs from that of MTAL. In MTAL there 
are two views of a type: 

I. Within an object file, a type exported by that object file is completely trans- 
parent. The definition of a type label is given by its binding in the type heap, 
TH. Because the type heap may contain circular bindings, there are word 
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value operations unroll(i(;) and roll‘(?ii) that unfold and fold the definition 
of a type label in the type of w, respectively. For example if a file system 
module defines a file abstract type t as (int^, int^), and w is a word value 
with this type, then roll*(w) gives a word value with type t, that is with the 
concrete type folded to the defined type. This means that all types defined 
in object files are datatypes. In other words, there is no equality theory for 
implicitly ununfolding the definitions of type identifiers exported by object 
files, so type equivalence for such type identifiers is based on name equiva- 
lence rather than structural equivalence. 

2. Outside of an object file, a type exported by that object file may be trans- 
parent or opaque. The interface only provides the kind, and the type heap is 
only visible within the module. Hicks et al |0j use a module system similar 
to MTAL, except that they also allow an object file to export some of its 
type definitions, so types may be made transparent to clients. 

The advantage of requiring all defined types to be datatypes is that recursive 
types are assured to be iso-recursive type^ thus greatly simplifying the problem 
of type-checking. The problem with this approach is that it does not adequately 
handle type sharing for shared libraries. This is explained in more detail in [n|. 
Consider for example the following Objective ML code m- 

module type S = sig type t; val x:t end 
module SI : S = struct type t = C; val x = C end 
module S2 : (S where type t = Sl.t) = SI 
if true then Sl.x else S2.x 

The module SI defines a datatype t with single constructor C, and binds the field 
X to this constructor. The last conditional type-checks because S2.x has type 
S2.t, and the type of S2 includes the constraint t=Sl.t, which is also the type 
of Sl.x. The structure SI is an example of a shared library, in the sense that 
the identity of its (abstract) type component Sl.t is shared with S2.t. The 
datatype restriction, on the other hand, requires the insertion of marshalling 
and unmarshalling code at the interface of a shared library, severely curtailing 
its usability. An example is provided in [12] . 

It is informally mentioned in the description of MTAL that the implemen- 
tation includes singleton kinds to expose type definitions to clients of object 
files. However this is not formalized in the type system and therefore several 
important issues are left unresolved. For example it is not hard, using singleton 
kinds, to define two mutually recursive types in separate object files, and link- 
ing those files then results in equ-recursive types. This problem can be avoided 
by only allowing singleton kinds to contain type labels, where the definitions 
remain encapsulated in the type heap in the object file. In terms of the type 

® Harper, Crary and Puri jS] make the distinction between iso-recursive and equ- 
recursive types. The latter require an equality theory for types that includes a rule 
for implicitly unrolling a recursive type. The former do not require this equality, 
and instead rely on operations in the language for explicitly folding and unfolding 
recursive types. 
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system presented here, this amounts to only allowing type sharing constraints 
in the interface, and not allowing type definitions to be exposed. 

In our type system we allow both exposure of type definitions, and type 
sharing, to be expressed in module interfaces. This is done without allowing 
equ-recursive types in the type system. This is done by separating these two 
uses of type information in the interface: 

1. Exposure of type definitions is expressed using box kinds. Box kinds differ 
from singleton kinds in the following way: whereas singleton kinds allow 
implicit equality of a type identifier with the type in its singleton kind, box 
kinds require explicit coercions in the term language between a type identifier 
and the type in its box kind. 

2. Type sharing is expressed using type sharing constraints. The type system 
includes an equality theory that is merely the congruence closure of an equal- 
ity between type identifiers defined by a context of type sharing constraints. 
Since equality is only between identifiers, there is no problem with analysing 
recursive constraints. This is particularly important when we consider dy- 
namic type-checking of DLLs. 

TMAL replaces the roll* and unroll operations of MTAL, with operations 
for constructing and deconstructing values of types with box kind: 





Introduction 


Elimination 


MTAL Expression 


roll*(t(;) 


unroll('u;) 


MTAL Side-Condition 


w : A, TH{t) = A 


w : t, TH (t) = A 


TMAL Expression 


foldt(w) 


unf oldt(w) 


TMAL Side-Condition 


t : KIA, w : A 


t : KIA, w : t 



Because the TMAL operations are typed independently of the type heap, box 
kinds can be used to expose type definitions in the interface of an object file. In 
contrast with singleton kinds, because explicit coercions are required between a 
type with box kind and the type in its kind, recursive types are guaranteed to 
be iso-recursive types. 



4 Typed Module Assembly Language 

Fig. Eprovides the syntax of Typed Module Assembly Language. In comparison 
with MTAL, the major changes in module interfaces are: 

1. We enrich kinds with box kinds KIA. For simplicity we only consider simple 
types in this account. Box kinds generalize to type operators with some care 

ca- 

2. We enrich import and export interfaces Int with a type sharing context 
This is a set of equality constraints between type identifiers. 
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K € Kind : 


~ ty \ Kl H 


A, B G Type Cons : 


■— t int \/[ti ■. Ki, . . . ,tm ■■ Km]P 




1 \ {{)) \ OT \ Int 


j G Initialization Tag : 


:= 0 1 1 


^ G Type Heap Interface : 


:= {ti :: ti : Ki , . . . , 4 :: tk : Kk} 


G Value Heap Interface : 


■— {a; :: xi : Hi, . . . , a: :: Xfc : Ak} 


S G Type Sharing Cons : 


~ {ti = t\ G Ki, . . . ,tk — tk G Kk} 


r G Register File Type : 


■a 

II 


A G Type Var Context : 


:= {h : Ki, . . . ,tk ’■ Kk} 


h G Heap Value : 


:= code[ti : Ki, . . . ,tm ■■ K^]r.I 




1 {wi,...,Wk) 1 {{w,OT)) \ O \ ST 


r,r^,r“ G Register Name : 


■- rO,rl,... 


w G Word Value : 


■.= n \ X \ w[Ai, . . . , Ak] 


V G Small Value : 


~ w \ r 


TH G Type Heap : 


:= {ti :: ti : K\Bi , . . . , tk : KkBk} 


€ Type Binding : 


■.= = A (Type Definition) 




1 ~ t (Shared Type Binding) 


VH G Value Heap : 


:= {xi :: Xi : AiB \, . . . , :: Xk : AkBl} 


G Value Binding : 


:= = h (Value Definition) 




1 = X (Shared Value Binding) 


R G Register File : 


:= {ri !->• wi, . . . ,rfc !->• Wfc} 


p G Renaming Substitution : 


:= {ill !->■ n'l, . . . , nfc !->■ n}} 


/ G Instruction Sequence : 


:= ii; • • • ; ik 


i G Instruction : 


:= addri,r 2 ,u | malloc r[H] | jmp u | 


Int G Interface : 


- {T>,T,S) 


OT G Object File Type : 


]Inti => Ints] 


O G Object File : 


:= [Inti ^ {TH, VH) : Ints] 


ST G Symbol Table : 


:= {t 1— >■ t, X 1— >■ y} 


P G Program State : 


■- {TH, VH,R,I) 



Fig. 2. Syntax of TMAL 
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Purpose 


Instruction 


Semantics 


Linking, 

interface 

matching 


dllink rT,rT,rT 
dlcoerce rT,rT, OT 
dlrename r™,r™,p 


Link modules 
Coerce to interface 
Rename external labels 


Dynamic 

imports 


dlopen r“, r™ 
dlsym_t [t : A']rf,r 2 ,t 
dlsym_v r, r®, x 


Initialize module 
Import type 
Import value 


Shared 

definitions 


dlsetsym_t rV ,r^ AA 
dlsetsym_v r™, r^,v, x 


Set shared type 
Set shared value 


Dynamic 

linking 


dldynamic r, v, OT 
dlload r™,ri,r 2 , OT 


Construct DLL 
Extract module 



Fig. 3. Summary of TMAL instructions 



3. To support coercive interface matching, we add external labels to type and 
value heap interfaces. As explained in the next section, this allows some of 
the fields in a module to be safely made private, whereas allowing private 
fields in MTAL leads to the possibility of run-time name clashes. 

There are two forms of module values in TMAL: 

1. Modules or object files O = [Intj {TH , VH) : IntE]- This defines a 

type heap TH and a value heap VH , that may be linked with other such 
heaps using the TMAL operations. Intj = (<?/,'?'/, S'/) is the interface of 
symbols imported by the module, while IntE = ^e) is the interface 

of symbols exported to clients of the module. 

2. Symbol tables ST = {t i— >■ t, x i— >■ x}. A symbol table arises from the initial- 
ization of a module. Initializing a module adds its type and value definitions 
to the type and value heaps, respectively, of the running program. The sym- 
bol table provides mappings from the external labels of the module to the 
heap addresses of its definitions. TMAL provides operations for dynamically 
importing these addresses into a running program, using a symbol table to 
perform a run-time lookup based on external labels. 

A type heap definition < t : KB^ has one of two forms: 

1. A definition of the form t :: t : K = A defines a branded type t with external 
name t and definition A. External names are explained in the next section. 
The most general kind for such a type is KIA, revealing the structure of the 
type definition. This is a subkind of ty, the kind of simple types that makes 
type definitions opaque. 
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2. A definition of the form t :: t : K = t' defines a shared type t that is equated 
to the type t' . Such a type sharing definition can be exposed in an interface 
by a type sharing constraint t = t' € K. 

Similarly a value heap definition x :: x : AB^ has one of the two forms x :: x : 
A = h (analogous to a value heap definition x H> ft. in MTAL) or x :: x : A = y 
(a value sharing definition). Module initialization transforms a value sharing 
definition to a value heap definition x : : x : A = ft by looking up the definition of 
y in the heap. Initialization may detect circular value sharing definitions, which 
correspond to values with no clearly defined initial values. 

Are first-class modules necessary for dynamic linking? In TMAL, modules 
are manipulated (loaded, coerced and linked) at run-time. This in itself does not 
necessarily require modules as first-class values, and indeed TMAL is based on 
a module language where there is a strict separation between module values and 
simple values m- Nevertheless a critical part of the transition from a high-level 
language to TAL is closure conversion, where environment slots are allocated for 
local variables in a procedure, and the contents of the register file are saved to 
the environment on a procedure call. Since some local variables may be bound 
to module values, it is therefore necessary in TMAL to make modules into first- 
class values. For example, the kernel language described in m includes a letmod 
construct for binding a local module identifier to a module: 

letmod s = Mod in Expr 

where Mod is a module language expression and Expr a core language expression. 
Closure conversion then requires that an environment slot be allocated for the 
free module identifier s, leading to the need for first-class modules. 

This potentially has some unpleasant consequences. For example Lillibridge 
m has demonstrated that type-checking is undecidable for a type system with 
first-class modules. The source of this undecidability is a subtype relation be- 
tween modules that allows fields to be made private, and allows type definitions 
to be made opaque. There is no such subtype relation in the core language of 
TMAL, and therefore no such subtyping for modules. This makes “first-class” 
modules in TMAL strictly less powerful than general first-class modules. For 
example with general first-class modules, it is possible for the two arms of a con- 
ditional to return modules with different interfaces, by having the result interface 
contain the intersection of the fields of the two modules. However the weak type 
system for modules in TMAL is sufficient for the purposes of closure conversion, 
and avoids the undecidability problems with more general type systems. 

Rather than allowing type subsumption for modules, TMAL has a dlcoerce 
instruction for explicitly coercing a module to a required type. This coercion 
operation requires that the module’s type be a subtype of the required type: 



OT ^ OT' 



OT=[Inti => IntE], OT' = [Int'j 
Int'j A In-tj and IntE A IM'e 
I nt={^,E,S), Int' = 



Int A Ifit 
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<P < , S' < W' , and ^ entails 

<p <<p' <;=^ ^={tk ■■■■ tk : ATfc}, ^' = {<m :: tm : Km}-, k>m, Km < K!m 
<W' <;=^ ^={Xk ■■■■ Xk : Ak\, W = {Xm ■■ Xm : ^m}, k>m, Am = A'm 

So interface containment reduces to kind containment (where the only contain- 
ments are of the form KIA < ty) and equality between types. The latter equality 
relation includes entailment based on sharing constraints in the context. The lat- 
ter constraints can only relate type identifiers, so the equality relation includes 
rules for forming the congruence closure of these equalities. Because sharing 
constraints can only relate type identifiers, it is straightforward to extend the 
language of types with type operators (A-abstraction) and /3-conversion of types. 

The type formation rules for modules (object files) and symbol tables are 
provided in App. ID These operations are discussed in Sect. w and formally 
specified in App. m 



5 Coercive Interface Matching 

MTAL assumes that all field names are globally defined, and interface matching 
is based on these global field names. Any “implicit” renaming of an identifier 
requires it to be rewritten globally. There is no notion (as in our approach) of 
differentiating between external and internal names, with internal names locally 
bound, and therefore allowing local renaming of these internal names to avoid 
name clashes during linking. In the MTAL approach, if two modules have fields 
with the same name, these names are references to the same global symbol, 
and any renaming of the symbol must be performed in both modules. As a 
consequence, if fields of an object file are made private in MTAL, there is no 
way to rename the private fields in order to avoid name clashes when this object 
file is linked with other object files. 

We want to support run-time linking where a library is loaded from disk into 
the program address space and linked with other libraries. Type safety requires 
a run-time type check at some point in this scenario. This type check requires 
that the labels do not admit implicit renaming (such as alpha-conversion in the 
lambda-calculus). We do not expect that all labels of the loaded library are 
known, only those labels specified in the expected interface in the run-time type 
check. Following the MTAL approach, there is the potential for confusion of 
labels because some of the “hidden” labels in the loaded library may be the 
same as labels in the libraries it is linked with. 

This is the motivation for generalizing labels in type and value heap interfaces 
to include external names t and x. Type and value heap interfaces have the form 

<1^ = {t :: t : K} and ^ — {a; :: x : A} 

The internal names t and x represent local (type and value) heap addresses. 
These names admit implicit renaming or alpha-conversion, corresponding to re- 
locating symbols in a heap. The external names t and x represent external labels 
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that allow reference to the internal contents of a heap component of a module 
from outside. To allow fields of a module to be made private, external type and 
value names in type and value heaps include the special symbol *, the name of 
a private field. Fields in a module are made private using the dlcoerce instruc- 
tion, that changes the external names of fields made private to The private 
external name * should never appear in a type or value heap type. 

Before the contents of a module can be used by a running program, its heaps 
must be combined with the program heaps. This combination ensures that the 
internal labels of the module heaps are distinct from the internal labels in the 
program heaps. 

Following m we provide three operations for combining and adapting mod- 
ules. The choice of these operations is informed by an analogy between module 
combination and process composition in process algebras such as CCS m 



Operation 


TMAL 


CCS 


Linking 


dllink rj”,r^,r^ 


{P\Q) 


Coercion 


dlcoerce OT 


{P\x) 


Renaming 


dlrename r^^r'^,p 


P[p] 



The dllink instruction links together two modules, combining the type and 
value heaps. The modules being linked together are in the source registers r™ and 
r™, and the result of linking is left in the destination register r™. The exports of 
the resulting module are the union of the exports of the two modules, while the 
imports are the union of the imports of the linked modules minus any imports 
that are resolved by linking. To obtain a coherent result, the type rules require 
that the external labels of the exports of the two linked modules are distinct. To 
maintain this restriction, the external labels of a module must always be visible 
in the type of the module. The linking operation also requires that the internal 
labels of the exports of the modules be distinct. Since internal names are bound 
within a module, they can be renamed to avoid name clashes when merging the 
fields of the modules being linked. In a concrete implementation, this renaming 
is handled straightforwardly by relocating the internal addresses of two object 
files that are linked together. 

The dlcoerce instruction is necessary because of the absence of a subsump- 
tion rule based on interface containment for modules. This latter subsumption 
rule is not allowable because of the requirement that the external labels of a 
module must always be visible in its type. The coercion operation performs a 
run-time adaptation of a module, removing some of its external labels. The cor- 
responding definitions are no longer visible to external clients of the module, but 
are still accessible via their internal labels to other definitions within the module. 
The source module is in register r™, while the result of coercion is left in the 
destination register rj". The type to which the module is coerced is specified by 
the object file type OT. This type annotation is mostly only for type-checking, 
and can be removed before execution. The part of the annotation that must be 
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preserved during execution is the association between external names and inter- 
nal names; TMAL includes instructions for looking up a field in an initialized 
module based on its external name. 

The dlrenamie instruction is a second operation for coercive interface match- 
ing, and renames some of the external labels in a label. A renaming substitution 
p is an injective mapping from external labels to external labels. Since exter- 
nal names are used at run-time, this renaming substitution must be applied at 
run-time. 

6 Dynamic Imports 

The instructions given in the previous section operate on values at the module 
language level. At the heart of the TMAL approach are the instructions that 
connect the module language level to the core language level. In the module 
language described in this connection is provided by an init operation 
that initializes a module and introduces its definitions into a local scope in 
a core language program. In TMAL the init operation is realized by three 
instructions, for initializing a module and for importing its definitions into the 
scope of a running thread: 



Operation 


TMAL 


Initialize module 
Import type 
Import value 


dlopen r'*, r™ 
dlsym_t [t : A]rf,r|,t 
dlsym_v r, r®, x 



These operations allow a program to import some of the symbols from a DLL, 
using the external labels of a DLL to access its definitions. 

The dlopen instruction expects a pointer to a module in register r™. The 
instruction initializes the module, addings its type and heap bindings to the 
program heaps, and building a symbol table with mappings from the module 
fields to their bindings in the program heaps. A pointer to this symbol table is 
left in register r'*. 

The dlsym_t instruction imports a type definition into the local context of 
the current thread, while the dlsym_v instruction imports a value definition. 
The dlsym_t operation imports a type symbol from a DLL into a register, using 
the external label of the type symbol and the symbol table of the DLL to map 
to the internal label. Note that the internal label cannot be known statically; 
the internal label is chosen at the point where the DLL is initialized and its 
value definitions are added to the program’s value heap. This is in contrast with 
MTAL, where heap locations are referenced by globally bound internal names, 
and where renaming to avoid name clashes is not possible. In TMAL, the internal 
label is chosen so that there is no clash with the labels already given to program 
heap contents. Since the complete contents of the program heap are not known 
until run-time, there is no way to know the internal label during type-checking. 
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The dlsym_t instruction expects a pointer to a pointer to a symbol table in 
register , and specifies the external name t of the type symbol to be imported 
in the local thread address space. The instruction binds the type parameter t in 
subsequent instructions to the type heap address corresponding to this external 
name. The symbol table type must be modified so that references to the global 
heap address are rebound to this local type parameter, for type-checking sub- 
sequent importations of definitions that rely on this type symbol. Register rf is 
left with a pointer to a symbol table of this modified type. 

The dlsym_v operation imports (the heap address of) a value definition from 
a DLL into a register, using the external label of the value definition and the 
symbol table of the DLL to map to the internal label. The external name x of 
the definition is specified in the instruction, and the instruction leaves the value 
heap address of the definition in the value register r. 



// Assume 


si points to loaded file system module 


dlopen 


s2 , si 


// Initialize module 


dlsym_t 


[FileT:ty] s3 , s2 , F/iJfe Import file type 


dlsym_v 


s4,s3,open 


// Import file open operation 


mov 


aO,file_name 


// Load file name 


mov 


ra , retpt [FileT] 


// Load continuation 


j“ip 


s4 [EnvT] 


// Jump to file open operation 


retpt : 
f ilejname 


code [FileT] {vO 
: "/etc/passwd" 


FileT, sp: EnvT} ... 



Fig. 4. Example of dynamic imports 



Fig. El gives an example of the use of these operations. Assuming the si reg- 
ister points to a module, the dlopen instruction initializes that module, addings 
its type and value heap definitions to those of the running program. The result of 
initialization is a pointer, in the s2 register, to a symbol table mapping from the 
external labels of the module to the addresses of its definitions in the program 
heaps. 

The important proviso in the dlsym_v operation is that none of the free type 
variables in the type of a value definition are bound by the type heap definitions 
addressed by the symbol table. For example, recalling the example in Fig. E] 
assume that the symbol table resulting from initializing the file system module 
has type: 



type File::File : ty 
val open: :open : 

V [EnvT :ty] {aO : String, sp : EnvT,ra:V[] {vO : File ,sp:EnvT} 
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The abstract file type File occurs free in the type of the open operation. There- 
fore the dlsym_v instruction cannot import this definition immediately. The 
reason is that the register file type resulting from this importation would have 
no binding for the type identifier File in the type of the vO register. 

In order to import the open operation, the type identifier File that occurs 
free in its type must first be imported from the DLL. This is done using the 
dlsym_t operation. In the example in Fig. 0, the dlsym_t instruction binds a 
local type identifier File! to the abstract type File defined by the DLL. The 
s3 register is bound to a new symbol table with type: 



val open: :open : 

V [EnvT : ty] {aO : String, sp : EnvT,ra:V[] {vO : 



File! 



,sp:EnvT} 



The abstract file type in the type of the open operation has been relocated to a 
type bound in the local context of the current thread, therefore it is now possible 
to import the open definition from the DLL. 



7 Shared Libraries 

Heaps in modules may contain shared type bindings t :: t : K = t' and shared 
value bindings a; :: x : A = y. If all linking is performed before a program runs, 
then shared bindings are unnecessary. However shared bindings become crucial 
in an environment where modules are initialized at run-time. 

For example, consider a module implementing a network protocol. This im- 
plementation requires some operations and types that are only provided by the 
operating system. Module linking can be used to combine these modules into a 
single module implementing the operating system with that protocol: 

// Assume si points to loaded OS module 
// Assume s2 points to loaded protocol module 
dllink s3,sl,s2 // Link OS, protocol modules 

dlopen s4,s3 // Initialize module 

dlsym_t [Conn:ty] s5,s4,Coi^ji Import connection type 

dlsym_v s6,s5,open // Import conn open operation 

However there is a difficulty with this approach: the operating system will 
have already been initialized when the program runs. In fact the operating sys- 
tem is really the first module to be initialized, and a running program is just 
another module that has been loaded and initialized by code defined in the 
operating system module. 

Similar remarks apply to access to OS operations from a process. The process 
must somehow have access to labels into the OS type and value heapfl but it is 
unrealistic to expect a program to be linked with its own copy of the OS module 

As mentioned in Sect. ^ approaches such as typed assembly language should be 
regarded as an alternative to current heavyweight protection mechanisms such as 
hardware-based memory protection and the use of library stubs to trap to the OS. 
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before execution can begin. The OS is one example of a shared library, a library 
that is loaded and initialized once, and that is subsequently available to other 
libraries as they are loaded. 

The following instructions allow a program to construct a shared library: 



Operation 


TMAL 


Set shared type 
Set shared value 


dlsetsym_t r5",r™,t,t 
dlsetsym_v r™,r™,r, x 



The dlsetsym_t instruction allows a reference to a type to be added to the 
export list of a module, while dlsetsym_v instruction allows a reference to a 
value to be added. These are not the only way that shared value and heap 
definitions can be constructed. For example, the initial value heap in a module 
may contain the definition of another module (manipulated by the parent module 
at run-time) that has aliases for value and type bindings, where the child module 
definitions that are shared are bound in the parent module heaps, or are imported 
or exported by the child module. However the aforesaid instructions are the only 
way to introduce aliases into a module, for shared bindings that are not available 
until run-time. For example, they are the only way to add bindings for OS types 
and operations into a module that requires those OS definitions. Once such a 
shared library has been constructed, the dllink instruction allows it to be linked 
with other modules. 

The dlsetsym_t instruction expects a pointer to a module in register r™, 
and a pointer into the type heap in type “register” t. The module should import 
a type definition with external label t and with a kind compatible with t. The 
instruction moves the type field with label t from the import list of the module 
to the export list, binding to the field to the type heap pointer given by t, and 
the resulting module is given in register r™. 

The dlsetsym_v instruction expects a pointer to a module in register r™, and 
a heap address in register r. The field labelled with x in the module is moved 
from the import list to the export list, bound to the value heap pointer in r, and 
a pointer to the resulting module is left in register r™. 

Returning to the example above of a protocol module, suppose that this 
module requires a type Protid of protocol identifiers and an operation deliver 
from the OS. The latter operation is used by this protocol module to deliver a 
protocol data unit to the next protocol above it in the protocol stack. 



// Assume si points to initialized OS module 
// Assume s2 points to loaded protocol module (PM) 
dlsym_t [Protid :ty] si, si, Protid // Import prot id type 

dlsym_v s3, si, deliver // Import deliver operation 

dlsetsym_t s2,s2, Protid, Protid // Export protocol id to PM 

dlsetsym_v s2,s2,s3, deliver // Export deliver to PM 

dlopen s4,s2 // Initialize PM 
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Alternatively, if the code for initializing the protocol module is in the OS 
itself, then this code can be defined as: 

// Assume s2 points to loaded protocol module (PM) 
dlsetsym_t s2,s2,Protld,ProtId // Export protocol id to PM 
dlsetsym_v s2,s2, deliver, deliver // Export deliver to PM 
dlopen s4,s2 // Initialize PM 

where Protld and deliver are direct references into the type and value heaps, 
respectively, in the module implementing the OS. 

For example, considering the example above of assigning the Protld and 
deliver fields of a protocol module, assume that the protocol module has type: 

import type Protld: : Protld 

import val deliver :: deliver : V [EnvT] {aO : Protld, ... } 
export type Conn:: Conn 

export val open::open : V [EnvT] {aO : String, ... } 

Then setting the Protld field with the Protld type defined in the OS module 
results in a module with type: 

export type Protldl :: Protld 

import val deliver :: deliver : V [EnvT] {aO : Protldl ,... } 
export type Conn:: Conn 

export val open: : open : V [EnvT] {aO : String, ... } 
sharing type Protldl = Protld 

In this case the internal type name Protldl is a renaming of the internal name 
for the type of protocol identifiers, so as to avoid a name clash with the internal 
type name Prodld in the OS module. If the OS module has a value heap label 
deliver with type 

V [EnvT] {aO : Protld, . . . } 

then the type sharing constraint allows this type to be equated with the type 
of the deliver heap label in the protocol module. This allows the dlsetsym_v 
instruction to be used to assign this value field. 

8 Dynamic Loading 

The final set of instructions are used to attach run-time type information to a 
DLL. This type information is used in a run-time type check, to ensure that a 
DLL that is loaded from disk or from the network has the required module type. 
There is an instruction dldynamic for bundling a value with a type description, 
and another instruction dlload for checking that a DLL has a specified type. 



Operation 


TMAL 


Construct DLL 
Extract module 


dldynamiic r,v, OT 
dlload r™,ri,r2, OT 
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The type expression (( )) denotes the type of a DLL. The dldynamiic instruc- 
tion associates a type descriptor OT with the heap address of a module in a DLL 
value {{w, OT)), of DLL type. To be completely accurate, object files should be 
stripped of unnecessary type information before run-time. Then the only places 
where type information is required are (a) the external labels of import and ex- 
port lists (since these labels are used by various instructions to look up fields in 
modules and symbol tables, and (b) in the dldynamic and dlload instructions 
above, and DLL values. We forgoe specifying this type-erasure semantics for lack 
of space. 

The dlload instruction extracts a module from a DLL. This instruction also 
requires the value representation of a module type, the type that is expected of 
the module in the DLL. The instruction performs a run-time interface contain- 
ment check, and if this succeeds it coerces the module in the DLL to the required 
type. If the interface check fails, control transfers to the failure continuation in 
register r2- 

The interface check includes a check for entailment of type sharing con- 
straints. The simple form of type sharing constraints, only relating type iden- 
tifiers, and the fact that the bindings in the type heap are opaque, facilitate 
this entailment check. The fact that type heap bindings are opaque also has the 
benefit that the dynamic type check cannot violate encapsulation of abstract 
types; this is explained in more detail in m 

9 Related Work 

There has been a great deal of work on the semantics of module interconnection 
languages, particularly in the context of the ML module system HblL^lsllill 
E3j. The notion of separating external and internal field names, with the latter 
allowing renaming to avoid name clashes, originated with Harper and Lillibridge 
ra- A related idea is used by Riecke and Stone to allow fields of an object 
to be made private, and the object then extended with a field with the same 
external name. Similar notions of internal and external names appear in the 
module calculi of Ancona and Zucca ^ and Wells and Vestergaard ■ 

Cardelli [Z] gives a semantics for Unix-style linking in terms of a simple A- 
calculus, ensuring that all symbols in a program are resolved before it is executed. 
Flatt and Felleisen H3| and Glew and Morrisett HH extend this work to consider 
typed module contents and circular import dependencies. It is not clear what 
the type of a module is in these approaches (linking simply resolves imports 
against exports in a type-safe way). Glew and Morrisett do not support shared 
libraries (type sharing) or dynamic linking. Flatt and Felleisen allow dynamic 
linking of units. However the invoke operation for initializing a unit returns a 
single core language value; there is no other way for a program to access the 
contents of a unit. The invoke operation takes as arguments types and values 
from the running program that can be provided as imports to a library before 
initialization. So there are really two linking operations with units: the linking 
operation for merging units and the more limited linking that is implicitly part of 
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the semantics of initialization. Our approach provides a single linking operation, 
and addresses the problem of sharing type (and value) identity that is not con- 
sidered by these other approaches. In our system, the invoke operation of units 
would translate into a sequence of dlsetsym_t and dlsetsym_v instructions, to 
build the imports for the unit, followed by a dlopen instruction to initialize the 
unit, followed by a dlsym_v instruction to retrieve the single value returned by 
unit initialization. 

Crary et al 0 give an explanation of recursive modules in terms of the 
structure calculus m- Their work is predicated on the assumption that module 
linking is based on functor instantiation, and phase-splitting allows this to be 
transformed to core-language function application. As discussed in m , it is 
difficult to generalize this model of linking to the kinds of module operations we 
consider. 

Work on dynamic linking in ML has focused on dynamic types riliOH 1 : 141111 . 
With these approaches a dynamic value tags a value with a runtime type tag, of 
type Dynamic. This is similar to our approach to dynamic linking, but extended 
to modules rather than simple values, as a way of reifying modules into the core 
language. 

A perennial problem with dynamics is that they violate encapsulation, in the 
sense that the underlying representation type of a value with abstract type can 
be exposed, by first bundling the value as a dynamic and then using runtime 
type checks to examine the representation type. This is an artifact of the fact 
that types are bound at runtime using beta-reduction. As mentioned in Sect. El 
our approach to DLLs avoids this problem, because the bindings in the type 
heap remain opaque during program execution. A similar approach is possible 
in the system of Hicks et al |t)l I Yj 

Russo E21 considers an approach to adding first-class modules to ML, based 
on converting module values to core language values and back again. Explicit 
type annotations for modules ensure there are no unpleasant interactions with 
type inference. Russo avoids the undecidability of type-checking with first-class 
modules by omitting type subsumption for modules converted to core language 
values. This is similar to our approach to ensuring decidability with first-class 
modules. Our reflective treatment of DLLs is different from Russo’s treatment of 
first-class modules. A module reified into the core language in Russo’s approach 
retains its type, though reified to a core language type. In contrast, our reification 
operation (for building a DLL) masks the type entirely, and there must then be a 
reflection operation (with a dynamic type check) that extracts a module from a 
DLL. Dynamic typing is not necessary with Russo’s approach, since his purpose 
is not to provide DLLs. 

Ancona and Zucca 0, building on earlier work in mixin modules Pj, pro- 
vide a primitive calculus of modules that supports circular dependencies. Types 
are restricted to branded types, that is, types where equivalence is based on 
name equivalence rather than structural equivalence. They do not consider dy- 
namic linking or shared libraries (and the resulting issues with recursive type 
constraints). 
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Wells and Vestergaard m present a calculus for equational reasoning about 
first-class modules. They do not place any restrictions on circular import depen- 
dencies (including dependencies between value components), allowing circular 
definitions that lazily unwind. They verify strong normalization and confluence 
for their calculus, relying on a lazy reduction semantics. They do not consider 
typing aspects of their calculus. So for example they do not consider the prob- 
lem of equ-recursive versus iso-recursive types, and they provide no support for 
shared libraries. Finally as with Russo’s work there is no consideration of nar- 
rowing a DLL to a specific interface, an important practical facility for dynamic 
linking. 

Crary, Hicks and Weirich niTi extend TAL with primitive operations for 
building type-safe DLLs, on top of which more expressive dynamic linking mech- 
anisms can be constructed. For example they are able to provide a type-safe 
implementation of the Unix dynamic linking API, as well as an implementation 
of units. Their approach amounts to extending the TAL kernel with type-safe 
checked casting I3Z1. Although their approach is type-safe, it is also more low- 
level than the approach described here, and so some errors that are caught 
statically in our type system are only caught dynamically by checked casting 
in their approach. The single type failure point in our calculus is the dlload 
operation, that reflects a DLL from the core language into the module language. 
The difference is really one of level; their approach could for example be the 
basis for an implementation of TMAL. 

The module type system underlying that of Crary et al is MTAL, and there- 
fore it shares the limitations of MTAL: the absence of coercive interface match- 
ing, and the absence of sharing. There are no operations for linking modules 
together at run-time, rather modules are loaded into a running program and 
their imports resolved against bindings in the global program heaps. Crary et 
al allow a module’s contents to be accessed before all of its imports have been 
resolved, allowing “lazy” resolution. In our approach a continuation can specify 
(as a module) the definitions it requires, and the continuation argument can be 
linked with other modules. To ensure that a module is initialized (opened) no 
more than once, a module cache can be implemented: the first time a module is 
initialized, a shared library is constructed (using dlsetsym_t and dlsetsym_v) 
with the same interface, and this shared library saved in the cache with the same 
module name used to load the original module. Subsequent searches for this li- 
brary will find the cached version, and it can be used for example to resolve the 
imports of subsequent DLLs. In this way a form of “lazy loading” as found in 
Java class loaders can be implemented on top of our module system. 

10 Conclusions 

We have described Typed Module Assembly Language (TMAL), an extension 
of typed assembly language with instructions for manipulating modules at run- 
time. These instructions include support for coercive interface matching, dy- 
namically importing definitions from a library, constructing shared libraries, and 
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using DLLs in a type-safe manner. A possible application of these mechanisms 
is in component-based programming environments, as demonstrated by com- 
mercial platforms based on COM or Java. The mechanisms described here can 
be used to enrich such environments with flexible but type-safe operations for 
interconnecting modules under program control. 

It is plausible that this is not the final word on the choice of instruction set for 
TMAL. Although the instructions for dynamic imports and shared libraries are 
fairly RISC-y, this is not true of the dllink, dlcoerce and dlrename instruc- 
tions, nor is it true of the dlload and dldynamic instructions. We are considering 
how these instructions could be decomposed into simpler instructions, to weaken 
the atomicity requirements of the current instruction set. 
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A Type Rules for Modules and Symbol Tables 



This appendix summarizes the type rules for modules and symbol tables. The 
type rules for values and heaps are specified using judgements of the form given 
in Fig. ^ The contexts of type and value heap bindings are defined by: 

$={{t:K)\{t-.:t:K)£ <P} 

W — {(x : A) I (x :: X : yl) G !F} 

The type rules for modules (object files) require that the type heap satisfies 
the exported type heap interface, that the value heap satisfies the exported value 
heap interface, and that the exported type sharing constraints are entailed by 
the type sharing implied by the type heap, the type sharing context, and the 
type sharing constraints imposed on the imports. 
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A h 0 


Type context formation 


$-A\-E 


Type heap interface 


$-A\3\-E = <P" 


Type heap interface equality 


$■ A\ 3 E < •P" 


Type heap interface containment 


$-A:3\- TH ■. P' 


Type heap 


P-A\- 3 


Sharing heap interface 


P-A\3\- 3’ 


Entailment of type sharing constraints 


P-A\-P 


Value heap interface 


P\ A\ 3 P' = P" 


Value heap interface equality 


P-A\3\-P'< P" 


Value heap interface containment 


$■ A:3:P\- VH ■. P' 


Value heap 


P-A\-K 


Kind formation 


$■ A: 3 h K = K' 


Kind equality 


P-A\3\-K<K' 


Kind containment 


P-A\-A:K 


Type formation 


$■ A: 3 h A = B £ K 


Type equality 


zl; S' h [Intj ^ IntE] ^ [Int'j ^ Int'^ 


Module type containment 


P-A\3:P\-h-.A 


Type of heap value 


P\ A-. 3-.P w : A 


Type of word value 


p-A\-r 


Register file type 


P-A\3:P\-R-.r 


Register file 


P-P-3V- {A;r} I {A'-r'} 


Instruction formation 



Fig. 5. Judgement Forms of TMAL 



{}; “ 1“ [Inti IntE] : ty Inti = I'l, ^i) Ints = I'e, ^e) 

$' = $u¥i LI TENV{TH) = SU~iLI SHARE{ TH) 

S' h TH : Se S' h TH : T>e 
T>; A-,S';Eu¥iLI VENV{ VH); A hvai VH : Ee 

<P',<I>', A', S \T A hvai [Inti ^ {TH j VH) : Int e[ '■ [Inti ^ Int e[ 

(Val Object File) 

<T- A; S; T-, A hyai e : A x 7 ^ ★ 

<?; $] A] S-,T; A hvai {x :: x : A = e} : {x :: x : A} 

{y : A') G T <P; A; S \- A = A' : ty x ^ * 

<P; $; A; S; T; A hvai {x x ■. A y} ■. {x x ■. A} 



(Heap Val Dee) 



(Heap Val Share) 
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(!>', {}; \- A K t * 

\- {type t :: t :: K = A} : [t :: t : K} 

t' € Names {(P) <P; {}; S \- t' : K t * 

<P;E: \- [type t :: t :: K = t'} : {t :: t : K} 

$;{};ShA:K 
h {t ::t K A A} -. {} 

t' G Names (<P) 

<P; S \- [t : K = t'} : [t = t' G ty} 



(Heap Type Dee) 
(Heap Type Share) 
(Heap Share Dee) 
(Heap Share Share) 



There are also rules for typing “private” type and value fields of a modules 
(private fields have the special external name *). 

The type rule for symbol tables is relatively straightforward. A symbol table 
is a mapping from type and value external names to type and value labels, 
respectively, in the global type and value heaps. The side-conditions that (p' C <P 
means that, in checking the well-formedness of types and kinds, global type 
heap labels are chosen to be consistent with the internal type names used in the 
interface of the symbol table. 



IntE = (^^ ■='') ST = {t I— >■ t, X I— >■ a;} 

$;A\-(P' $;A]S\- S' 

<P' = {FTTt :K} C$ <P' = {TTx : A} 
$;A;S;T\- ST : IntE 



(Val Symbol Table) 



The VENV, TENV and SHARE metafunctions are defined as follows: 

VENV{ VH) = {{x : A) \ {x :: £ : A H®) G VH} 

TENV{TH) = {{t : K) \ {t :: t : KB^) G TH} 

SHARE(TH) = {{ti ^ t2 G ty) \ (H :: ti : AT ^ fa) G TH} 



B Semantics of Module Linking Instructions 

In this appendix we provide more details of the static and dynamic semantics of 
the instructions of TMAL. The reduction rules use program states of the form 

{TH,VH,R,I) 

where A is a register file and I an instruction stream, and 

m = {{tB^) I (t :: t : KB^) G TH} 

VH = {{xB^) I (z :: X : AB'') G VH} 
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The global type and value heaps never contain shared bindings; such bindings 
are removed from an object file’s type and value heaps, as part of initialization, 
before they are merged with the global heaps. 

The type rules for dllink, dlcoerce and dlrename are similar to that for 
similar constructs described in m We omit the rules here for lack of space. The 
reduction rules for these instructions are given by: 

i?(r™) = and VH{xi) = [Int\ ^ {THi, VHi) : /nty, i = 2,3 
Int\ = {Int\ U Int\) Int] = {Int\ U Int^) \ {n | n S idom{Int\;)} 

THi = TH2 U TF3 VHi = VH2 U VH3 Si i dom(VH) 
idom{THi) fl idom{TH 2) = {} idom{VH i) fl idom{VH2) = {} 

VH' =VHU {xi = [Int] => {THi, VHi) : Int]^]} 

(dllink /)) ^ (fS , VH',R[r^ ^ xi],I) 

(Red DL Link) 



R{r^) = X2 and VH(x 2 ) = [Int'i ^ {TH' , VH') : 7n4] 

OT = \Inti => IntE] x\ ^ dom{VH) 

VH'' =VHyj{xi = [Inti ^ {COERCE{TH' JuI'e), COERCEjVH' J uI'e)) : /nig]} 

(tH, VH, R, (dlcoerce rT , rT , OT- /)) — ^ (tH , VlH', R{rT ^ xi], I) 

(Red DL Coerce) 



R{rT) = X2 and m{x2) = [Inti ^ {TH' , VH') : IntE] 
xi i domjVH) VH'' =VHyj{xx = [pjlntj) ^ {p{ TH'), p{ VH')) : pjlntE)]} 

{TH,VH,R, {dlrename rT,r^,p-, /)) — ^ {TH , Vlf',R[rT ^ xi],/) 

(Red DL Rename) 

The Red DL Link type rule for the dllink instruction computes the new 
import list using the join operation sqcup, rather than simply unioning the im- 
port lists of the two object files being merged. This is because, for an import 
definition that is imported by both object files, the new import list must con- 
strain the import to one compatible with both of the preceding import lists. For 
example, if one of the argument object files imports a type field t with kind ty, 
while another imports a type field t with kind Klint, then the new object file 
resulting from merging imports a definition of t with kind Klint. In computing 
the new import list, the dllink instruction removes from the import list any 
symbols that can be resolved against the combined export list. This operation 
is defined in Pg. 

The dlcoerce instruction uses the COERCE metafunction to hide bindings 
in the heaps (renaming their external name to *) that are made private by the 
new object file type and export interface. A definition of this metafunction is 
provided in ini. 
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The following type rule and reduction rule explain the semantics of the 
dlopen operation. The metafunction dom denotes the domain of a mapping, 
while idom{TH) = dom{TH) and idom{VH) = dom{VH). r[r : A] denotes the 
replacement of the type of r (if any) in the register file type T with the new 
type A. R[r 1-^ w] denotes the replacement of the contents of r in the register file 
R with the new contents w. The dlopen operation expects register r™ to point 
to a module with type [Inti IntE], where Inti = ({}? {}>{})• The operation 
leaves in register a pointer to a symbol table with interface IntE, after adding 
the heaps of the module to the program heaps: 



<!■ A; .j/; H h r- : [({}, {}, {}) ^ IuIe] F' = : I^e] 

{A; r} (dlopen r", r”") {A; T'} 



(Instr DL Open) 



7?(r™) = X and VH{x) = [({},{},{}) ^ [TH' , VH') : IuIe] 

ST = {(t 1-^ t) \ {t :: t : K) G Ints} U {(x !->• | (a :: x : A) G IntE} 

X i idom{ VH) U idom{ VH') idom(TH) n idom{TH') = {} T/T' =THJTH' 
idom{ VH) n idomj VH') = {} VW' = CLOSjVH U VH' U {x' = ST}) 

(TH, VH, R, (dlopen r“, r"*; I)) — > ( Tff", VW' , Rir" ^ x'], I) 

(Red DL Open) 



The CLOS{VH) operation removes shared value bindings of the form x : 
A = y from the value heap, by dereferencing y to its heap value definition: 



CLOS{ VH) 
DEREF ^(x) 



{(x = h) \ X € dom{ VH), h = DEREF ^(x)} 
ihif{x=h)Gm 

1 if (a; ^ y) G VH , h= DEREF^{y) 



The result of CLOS{ VH) is undefined if VH contains circular shared value bind- 
ings. This corresponds to an initialization failure due to cycles in the specification 
of initial values. 

The type rule and reduction rule for the dlsym_v instruction are as follows: 



<P-,A-,T-S\- H : {-P',T',S') 

jx::.:A)eT' FVjA) mdomj<P') = {} T' = Tjr : A] ^L SvMv) 

h {Z\; r} (dlsym_v r, r'*, x) {Z\; T } 



R{r‘’)=x and VH{x) = ST 

(TH, VH, R, (dlsym.v r, r", x; I)) — > (JH, VH, R[r ^ 5T(x)], I) 

(Red DL Symv) 

R(v) denotes the application of the register file R to the small value (register 
or word value) v: 



R(v) 



w if V = w 
R(r) if V = r 
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The type rule and reduction rule for the dlsym_t instruction are as follows: 



{t t ■. K) e <P' $uS'-,A-,EUE'\-K<K' t ^ dom($) 

A' = A\j{t-. K'} r' = r[rt : {$1 U $ 2 , H')] 
h {Zi; r} (dlsym.t \t : K']r{,rl,t) {Z\'; T'} 



(Instr DL Symt) 



R(r^2) = x &uAVH{x) = ST 5T = 5T' W {t i-^ 1} R' ^ R[r{ ST'] 

(Tff , , i?, (dlsym_t [t' : A'']r^r|,t; /)) — > {TH ,VH , R' , 

(Red DL Symt) 



In the reduction rule, the local type identifier t' is bound to the global type 
heap address t of the type definition pointed to by the symbol table. This allows 
the remainder of the instruction stream I to access the value heap definitions, 
pointed to by the symbol table, that have references to this type heap address. 

Type heap addresses and type identifiers serve only to support type-checking 
of the assembly code, and are stripped for run-time execution. The substitution 
{t/t'}I is performed only in the abstract reduction semantics. Although we do 
not elaborate on it further here, the dlsym_t instruction can be generalized to 
import run-time type tags from a DLL, for languages such as Java and Modula-3 
that associate type tags with some values. 

For the next two instructions, we abuse notation slightly by allowing union 
and set difference operations to be applied to interfaces. These are to be under- 
stood as the operations distributing over the components of the interfaces, for 
example: 



(^i,tFi,S'i) U (^2,>^2,S2) = (^>iU^2,'Z'iUtf'2, S'lU S'a) 

{<!>, s) yj ^' = {<P yj <P', s) 

The type rule for the dlsetsym_v instruction is reasonably straightforward. 
The only complication is that the type of the value field being assigned may 
have free type identifiers that are bound in the module. The typing rule relies 
on type sharing constraints in the module type that relate these locally bound 
type identifiers to global identifiers bound by the program type heap: 



A;^; S h V : A A;<F; S h : [Intj Int^j 
Inti = (^/, ^i) IntE = {^E, tf'B, ^e) {x :: X : B) G I'j 
$LI<P iLI <Pe] A; S L) Si LI Se a = B G ty 
r' = I— >■ [{Inti — {x :: X : B}) {IntE U {z :: x : i?})]] 

^y^yS L {A-,r} (dlsetsym.v r™, r™, n, x) {AyB'} 

(Instr DL SetSymv) 
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Ri'i'T) = X and VH{x) = [Inti {TH', VH') : Ints] (a; :: x : A) £ Inti R[v) = y 

Int'i = Inti — {a; :: X : A} Int'^ = Ints U {a; :: x : A} 
g ^ domjVH) VIR' =mu{z= [Int'i ^ {TH' , Lff' U {a: :: x : A ^ y}) : 7n4]} 

(Tff, Lff , i?, (dlsetsym.v rT,r^,v,^-, /)) — > (TH , VH'' , R\rT ^ z], I) 

(Red DL SetSymv) 

The dlsetsym_t instruction for assigning a type field in a module similarly 
relies on type sharing to equate any local type identifiers with global type iden- 
tifiers in the kind of the type being assigned. Free type identifiers may appear 
free in the kind of a field with box kind. Once a type field has been assigned, a 
type sharing constraint is added to the export interface of the module, to allow 
subsequent value fields to be assigned: 

$-A\-t':K S 'r ■. [Inti => Ints] 

Inti = {It>i,'Ri , Ei) Ints = {^e,'I'e , Ee) {t :: t : K') G <I>i 
U^; A;EUEiUEe\-K^K' 

r' = r[rT ^ [{Inti -{t-.-.t: K'}) ^ {IntE U {{t :: t : K'), {tGit' G A'')})]] 

$\'R\E \- {A', r} (dlsetsym_t , t' , t) {A; F'} 

(Instr DL SetSymt) 

R{rT) = X and m{x) = [Inti ^ (TH' , VH') : IntE] (t t : K) G Inti 

Int'i = Inti — {t :: t : K} Int'E = IntE U {{t :: t : K), (t = t' G K)} 
z i dom(VH) VW' = Fff U {2 = [Int'i ^ (TH' U {t t : K t'}, VH') : /n4]} 

(Tff, Fff , R, (dlsetsymT rT,rT,t',V I)) — ^ (iH, VlF',R[rT ^ z], I) 

(Red DL SetSymt) 

Finally the reduction rules for the instructions for creating a DLL, and for 
extracting a module from a DLL, are as follows: 

xidom{VH) m' ^VHyj{x = {(R[v),OT))} 

(tH, VH, R, (dldynamic r, v, OT; /)) — (fh, m' , R[r ^ x], I) 

(Red DL Dynamic) 

R(n) = x and VH[x) = {{y, OT)) 

OT = [Inti => IntE] OT" = [Int'i =» Int'i] VH{y) = [Int'i ^ {TH' , VH') : /n4] 
TENV{TH )- {}; SHARE{TH) h [Inti => IntE] A [Inti ^ Int'i] 

2 ^ domjVH) VIF' ^VH[j{z= [Inti ^ {TH' , VH') : Int'i]} 

{TH, VIF' , R, {dllo3,d r’",n,r 2 , OT"- /)) — ^ {TH, VH'',R[r"' ^ z], I) 

(Red DL Load Succ) 

The last rule handles the case where loading a DLL (reflecting a DLL from 
the core language into the module language) succeeds with a runtime type check. 
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There is also an associated rule for when the runtime type check fails; in this 
case, control transfers to the address specified by the address register C2, i.e., T2 
contains a pointer to a failure continuation that should be invoked if the runtime 
type check fails. 
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Abstract. A type- based certifying compiler maps source code to ma- 
chine code and target-level type annotations. The target-level annota- 
tions make it possible to prove easily that the machine code is type-safe, 
independent of the source code or compiler. To be useful across a range of 
source languages and compilers, the target-language type system should 
provide powerful type constructors for encoding higher-level invariants. 
Unfortunately, it is difficult to engineer such type systems so that anno- 
tation sizes are small and verification times are fast. 

In this paper, we describe our experience writing a certifying compiler 
that targets Typed Assembly Language (TALx86) and discuss some gen- 
eral techniques we have used to keep annotation sizes small and verifi- 
cation times fast. We quantify the effectiveness of these techniques by 
measuring their effects on a sizeable application — the certifying com- 
piler itself. Using these techniques, which include common-subexpression 
elimination of types, higher-order type abbreviations, and selective rever- 
ification, can dramatically change certihcate size and verification time. 



1 Background 

A certifying compiler takes high-level source code and produces target code with 
a certificate that ensures that the target code respects a desired safety or security 
policy. To date, certifying compilers have primarily concentrated on producing 
certificates of type safety. For example. Sun’s javac compiler maps Java source 
code to statically typed Java Virtual Machine Language (JVML) code. The 
JVML code includes type annotations that a verifier based on dataflow analysis 
can use to ensure that the code is type-safe. 

However, both the instructions and the type system of JVML are at a rel- 
atively high level and are specifically tailored to Java. Gonsequently, JVML is 
ill-suited for compiling a variety of source-level programming languages to high- 
performance code. For example, JVML provides only high-level method-call and 

* This material is based on work supported in part by the AFOSR grant F49620- 
97-1-0013, ARPA/RADC grant F30602- 1-0317, and a National Science Foundation 
Graduate Fellowship. Any opinions, findings, and conclusions or recommendations 
expressed in this publication are those of the authors and do not reflect the views 
of these agencies. 

R. Harper (Ed.): TIC 2000, LNCS 2071, pp. 117-^3 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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method-return operations. Also, it provides no provision for performing gen- 
eral tail-calls on methods. Therefore, JVML is a difficult target for compilers of 
functional languages such as Scheme that require tail-call elimination. 

In addition, current platforms for JVML either interpret programs or compile 
them further to native code. Achieving acceptable performance seems to demand 
compilation with a good deal of optimization. To avoid security or safety holes, 
the translation from JVML to native code should also be done by a certifying 
compiler. That way, we can verify the safety of the resulting code instead of 
trusting the “just-in-time” compiler. 

Another example of a certifying compiler is Necula and Lee’s Touchstone 
compiler Touchstone compiles a small, type-safe subset of C to optimized 
DEC-Alpha assembly language. The key novelty of Touchstone is that the cer- 
tificate it produces is a formal “proof” that the code is type-correct. Checking 
the proof for type-correctness is relatively easy, especially compared to the ad 
hoc verification process for JVML. As such, the Touchstone certificates provide 
a higher degree of trustworthiness. 

The proofs of the Touchstone system are represented using the general- 
purpose logical framework LF El. The advantage of using LF to encode the 
proofs is that, from an implementation perspective, it is easy to change the type 
system of the target language. In particular, the proof checker is parameterized 
by a set of primitive axioms and inference rules that effectively define the type 
system. The checker itself does not need to change if these rules are changed. 
Consequently, the use of LF makes it easy to change type systems to adapt to 
different source languages or different compilation strategies. Indeed, more re- 
cent work uses a very different type system for certifying the output of Special 
J PEI, a compiler for Java. 

Although changing the type system is easy for the implementor, doing so 
obligates one to an enormous proof burden: Every change requires a proof of the 
soundness of the type system with respect to the underlying machine’s semantics. 
Constructing such proofs is an extremely difficult task. In the absence of a proof, 
it is not clear what assurances a verifier is actually providing. 



1.1 An Alternative Approach 

Our goal is to make it easy for certifying compilers to produce provably type- 
correct code without having to change the type system of the target language. 
That way, it suffices to write and trust one verifier for one type system. To- 
ward this end, we have been studying the design and implementation of general- 
purpose type systems suitable for assembly language I2IE1. Ultimately, we hope 
to discover typing constructs that support certifying compilation of many orthog- 
onal programming-language features. 

Our current work focuses on the design of an extremely expressive type sys- 
tem for Intel’s IA32 assembly language and a verifier we call TALx86 (El- Where 
possible, we have avoided including high-level language abstractions like pro- 
cedures, exception handlers, or objects. In fact, the only high-level operation 
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that is a TALx86 primitive is memory allocation. We also have not “baked in” 
compiler-specific abstractions such as activation records or calling conventions. 
Rather, the type system of TALx86 provides a number of primitive type construc- 
tors, such as parametric polymorphism, label types, existential types, products, 
recursive types, etc., that we can use to encode language features and compiler 
invariants. These type constructors have either been well studied in other con- 
texts or modeled and proven sound by our group. 

In addition, we and others have shown how to encode a number of impor- 
tant language and compiler features using our type constructors. For example, 
our encoding of procedures easily supports tail-call optimizations because the 
control-flow transfers are achieved through simple machine-level jumps. In other 
words, we do not have to change the type system of TALx86 to support these op- 
timizations. Type soundness of TALx86 ensures that compilers targeting TALx86 
produce only code with safe run-time behavior. Some specific assurances are that 
the program counter will always point to executable code, unallocated memory 
will never be dereferenced, and system routines (such as input/output) will never 
be called with inappropriate arguments. In these respects, TALx86 provides an 
attractive target for certifying compilers. 

1.2 The Problem 

Unfortunately, there is a particularly difficult engineering tradeoff that arises 
when a certifying compiler targets a general-purpose type system like TALx86: 
Encoding high-level language features, compiler invariants, and optimizations 
into primitive type constructors results in extremely large types and type an- 
notations — often much larger than the code itself. Thus, there is a very real 
danger that our goal of using one general-purpose type system will be defeated 
by practical considerations of space and time. 

The work presented here is a case study in writing a certifying compiler that 
targets the general-purpose typed assembly language TALx86. The source lan- 
guage for our compiler, called Popcorn, shares much of its syntax with C, but 
it has a number of advanced language features including first-class parametric 
polymorphism, non-regular algebraic datatypes with limited pattern matching, 
function pointers, exceptions, first-class abstract data types, modules, etc. In- 
deed, the language is suitably high-level that we have ported various ML libraries 
to Popcorn without needing to change their structure substantially. The certify- 
ing compiler for Popcorn is itself written in Popcorn. 

Although the TALx86 type system is very expressive, it is far from being able 
to accept all safe assembly programs. However, we have found that it is expres- 
sive enough to allow a reasonable translation of Popcorn’s linguistic features. 
Because the compiler’s invariants are encoded in the primitive typing constructs 
of TALx86, the most difficult aspect of efficient, scafabfe verification is handfing 
the potentially enormous size of the target-level types. We use our experience 
to suggest general techniques for controlling this overhead that we believe tran- 
scend the specifics of our system. The efficacy of these techniques is demonstrated 
quantitativefy for the libraries and compiler itself. In particular, the size of the 
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type annotations and the time needed to verify the code are essentially linear 
in the size of the object code. The constant factors are small enough to permit 
verification of our entire compiler in much less than one minute. 

In the next section, we give a taxonomy for general approaches to reducing 
type-annotation overhead and further discuss other projects related to certifying 
compilation. Although it is an informal description of existing techniques, we 
have found this classification useful and we know of no other attempt to classify 
the approaches. 

In Section El we summarize relevant aspects of the TALx86 type system, 
annotations, and verification process. We then show how these features are used 
to encode the provably safe compilation of the control-flow aspects of Popcorn, 
including procedures and exceptions. This extended example demonstrates that 
an expressive type system can permit reasonable compilation of a language for 
which it is not specifically designed. It also shows qualitatively that if handled 
naively, type-annotation size becomes unwieldy. 

In Section 0, we use the example to analyze several approaches that we 
have examined for reducing type-annotation overhead. Section El presents the 
quantitative results of our investigation; we conclude that the TALx86 approach 
scales to verify our Popcorn compiler, the largest Popcorn application we have 
written. Moreover, all of the techniques contribute significantly to reducing the 
overhead of certifying compilation. Finally, we summarize our conclusions as a 
collection of guidelines for designers of low-level safety policies. 

2 Approaches to Efficient Certification 

Keeping annotation size small and verification time fast in the presence of op- 
timizations and advanced source languages is an important requirement for a 
practical system that relies on certified code. In this section, we classify some 
approaches to managing the overhead of certifying compilation and discuss their 
relative merits. None of the approaches are mutually exclusive; any system will 
probably have elements of all of them. 

The “Bake it in” Approach. If the type system supports only one way of com- 
piling something, then compilers do not need to write down that they are using 
that way. For example, the type system could fix a calling convention and require 
compilers to group code blocks into procedures. JVML, Touchstone, and Special 
J all use this approach. 

Baking in assumptions about procedures eliminates the need for any annota- 
tions describing the interactions between procedures. However, it inhibits some 
inter-procedural optimizations, such as inter-procedural register allocation, and 
makes it difficult to compile languages with other control features, such as ex- 
ception handlers. In general, the “bake it in” approach reflects particular source 
features into the target language rather than providing low-level constructors 
suitable for encoding a variety of source constructs. For example, the certifier 
for Special J first processes a, “class descriptor whose form is very close to that 
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of the JVM class descriptors” EH. so only programs conforming to the JVML 
class hierarchy and type system are certifiable by this checker. 

Even general frameworks inevitably bake in more than the underlying ma- 
chine requires. A TALx86 example is that labels are abstract — well-formed code 
cannot examine the actual address used to implement a label. This abstraction 
prevents some clever implementation techniques. Any verifiable safety policy 
must impose some conservative restrictions; choosing the restrictions is a crucial 
design decision that is a fundamental part of a policy. 

The “Don’t optimize” Approach. If a complicated analysis is necessary to prove 
an optimization safe, then the reasoning involved must be encoded in the an- 
notations. For example, when compiling dynamically typed languages such as 
Scheme, dynamic type tests are in general necessary to ensure type safety. A 
simple strategy is to perform the appropriate type test before every operation. 
With this approach, a verifier can easily ensure safety with a minimum of an- 
notations. This strategy is the essence of the verification approach suggested by 
Kozen m- Indeed, it results in relatively small annotations and fast verification, 
but at the price of performance and flexibility. 

In contrast, an optimizing compiler may attempt to eliminate the dynamic 
checks by performing a “soft-typing” analysis m- However, the optimized code 
requires a more sophisticated type system to convince the verifier that type 
tests are unnecessary. To make verification tractable, such type systems require 
additional annotations. For example, the Touchstone type system supports static 
elimination of array-bounds checks, but it requires additional invariants and 
proof terms to support the optimization. 

Another example is record initialization: An easy way to prove that memory 
is properly initialized is to write to the memory in the same basic block in which 
the memory is allocated. Proving that other instruction schedules are safe may 
require dataflow annotations that describe the location of uninitialized memory. 

Unoptimized code also tends to be more uniform, which in turn makes the 
annotations more uniform. For example, if a callee-save register is always pushed 
onto the stack by the callee (even when the register is not used), then the annota- 
tions that describe the stack throughout the program will have more in common. 
Such techniques can improve the results of the “Compression” approach (dis- 
cussed below) at the expense of efficiency. 

The “Reconstruction” Approach. If it is easy for the verifier to infer a correct 
annotation, then such annotations can be elided. For example, Necula shows how 
simple techniques may be used for automatically reconstructing large portions 
of the proofs produced by the Touchstone compiler m 

It is important that verification time not unduly suffer, however. For this 
reason, code producers should know the effects that annotation elision can have. 
Unfortunately, in expressive systems such as TALx86, many forms of type recon- 
struction are intractable or undecidable. The verifier could provide some simple 
heuristics or default guesses, but such maneuvers are weaker forms of the “bake 
it in” approach. 



122 



D. Grossman and G. Morrisett 



A more extreme approach to reconstruction would be to include a general- 
purpose theorem prover in the verification system. Unless the prover generates 
proofs that are independently checked, the trusted computing base would become 
larger and more complex. Any generated proofs would need to be concise as well. 
The TALx86 project has maintained the design goal that type-checking should 
be essentially syntax-directed; search and backtracking seem beyond the realm 
of efficient verification. However, recent work by Necula and Rahul suggests 
using annotations not to provide a proof, but instead to guide the prover’s non- 
determinism. In essence, the insight is that a compiler that knows enough about 
the verifier’s decision procedure can guide reconstruction to avoid the overhead 
of search. 

Certification systems invariably use reconstruction when the type of a con- 
struct is straightforward to compute from the types of its parts. For example, 
explicitly typed source languages never require explicit types for every term; 
these types are reconstructed from the explicit types of variables. Similarly, low- 
level systems do not explicitly describe how every single instruction changes the 
abstract state of the program. For most instructions, it is just as efficient to 
examine the instruction and recompute this information. 

The “Compression” Approach. Given a collection of annotations, we could create 
a more concise representation that contains the same information. One technique 
for producing a compact wire format is to run a standard program such as gzip 
on a serialized version. If the repetition in the annotations manifests itself as 
repetition in the byte stream, this technique can be amazingly effective (see 
Section EJ. However, it does not help improve the time or space required for 
verification if the byte stream is uncompressed prior to processing. 

A slightly more domain-specific technique is to create a binary encoding 
that shares common subterms between annotations. This approach is effectively 
common-subexpression elimination on types. Because the verifier is aware of 
this sharing, it can exploit it to consume less space. There is an interesting 
tradeoff with respect to in-place modification, however. If a simplification (such 
as converting an annotation to a canonical form for internal use) is sound in 
all contexts, then it can be performed once on the shared term. However, if 
a transformation is context-dependent, the verifier must make a copy in the 
presence of sharing. 

Work on reducing the size of JVML annotations has largely followed the 
compression approach j‘25p‘2j . For example, projects have found ways to exploit 
similarities across an entire archive of class files. Also, they carefully design the 
wire format so that downloading and verification may be pipelined. The TALx86 
encoding does not currently have this property, but there is nothing essential to 
the language that prevents it. 

Shao and associates have investigated the engineering tradeoffs of shar- 
ing in the context of typed intermediate languages. They suggest a consistent 
use of hash-consing (essentially on-line common-subexpression elimination) and 
suspension-based lambda encoding [ZH as a solution. Their hash-consing scheme 
also memoizes the results of type reductions so that identical reductions in the 
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future require only retrieving the answer from a table. The problem of managing 
low-level types during compilation is quite similar to the problem of managing 
them during verification, but in the case of type-directed compilation, it is ap- 
propriate to specialize the task to the compiler. 

Finally, we should note that comparing the size of compressed low-level types 
to the size of uncompressed object code is somewhat misleading because object 
code compresses quite well m- Domain-specific techniques include taking the 
instruction format into account (instead of the generic compression technique 
of processing entire bytes); detecting common sequences of instructions; and de- 
tecting similarity modulo a rarely repeated field, such as a branch target address. 
Analogous techniques may prove useful for annotations as well, but we know of 
no work that has tried them. 



The “Abbreviation” Approach. The next step beyond simple sharing is to use 
higher-order annotations to factor out common portions. Such annotations are 
essentially functions at the level of types. Tarditi and others used this approach 
in their TIL compiler m- As we show in Section ^ this approach can exploit 
similarities that sharing cannot. Furthermore, higher-order annotations make it 
relatively easy for a compiler writer to express high-level abstractions within 
the type system of the target language. In our experience, using abbreviations 
places no additional burden on the compiler writer because she is already rea- 
soning in terms of these abstractions. However, if the verifier must expand the 
abbreviations in order to verify the code, verification time may suffer. 

Higher-order abbreviations are also an important component in the certified- 
code framework that Appel and Felty PJ propose. They suggest formalizing a 
machine’s semantics and a safety policy in a higher-order logical framework. The 
code producer must then supply a formal proof that a program obeys the policy. 
Because a proof expressed directly in terms of a machine’s semantics would 
presumably be enormous, Appel and Felty suggest that a compiler would first 
prove that a collection of lemmas are sound with respect to the semantics and 
then apply the lemmas to a program. In a sense, these lemmas are parameterized 
abbreviations that define a suitably concise type system. 

In our system, we use all of these approaches to reduce annotation size and 
verification time. However, we have attempted to minimize the “bake it in” 
and “don’t optimize” approaches in favor of the other techniques. Unlike javac. 
Touchstone, or Special J, TALx86 makes no commitment to calling convention 
or data representation. In fact, it has no built-in notion of functions; all control 
flow is just between blocks of code. The design challenge for TALx86, then, is 
to provide generally useful constructors that compilers can use in novel ways to 
encode the safety of their compilation strategies. 

As a type system, TALx86 does “bake in” more than a primitive logical de- 
scription of the machine. For example, it builds in a distinction between integers 
and pointers. As a result, programs cannot use the low bits of pointers to store 
information and then mask these bits before reading memory. Also, memory lo- 
cations are statically divided into code and data (although extensions support 
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run-time code generation m)- In order to investigate the practicality of expres- 
sive low-level safety policies, we have relied on a rigorous, hand-written proof of 
type soundness and a procedural implementation of the verifier. A more formal 
approach would be to encode the proof in a logical framework and use a verifier 
produced mechanically from the proof. 

Using our approach, we have been able to examine the feasibility of compiler- 
independent safety policies on a far larger scale than has been previously possible. 
To date, certifiers based on proof-carrying code technology have all had compiler- 
specific safety policies and no compiler has targeted the compiler-independent 
safety policies of Appel and Felty. Not only was TALx86 designed to be compiler- 
independent, but we and others have written three separate compilers that target 
TALx86. In this paper, we discuss our optimizing Popcorn compiler^ This com- 
piler, itself a certified TALx86 program, is a several-hundred kilobyte executable 
compiled from over eighteen thousand lines of source code. 

3 Compiling to TALx86: An Extended Example 

In this section, we briefly review the structure of the TALx86 type system, its 
annotations, and the process of verification. In what follows, we present relevant 
TALx86 constructs as necessary, but for the purposes of this paper, it is sufficient 
to treat the types as low-level syntax for describing pre-conditions. Our purpose 
is not to dwell on the artifacts of TALx86 or its relative expressiveness. Rather, we 
want to give some intuition for the following claims, which we believe transcend 
TALx86: 

— If the safety policy does not bake in data and control abstractions, then the 
annotations that the compiler uses to encode them can become large. 

— In fact, the annotations describing compiler conventions consume much more 
space than the annotations that are specific to a particular source program. 

— Although the annotations for compiler conventions are large, they are also 
very uniform and repetitious, though they become much less so in the pres- 
ence of optimizations. 

Because of this focus, we purposely do not explain some aspects of the annota- 
tions other than to mention the general things they are encoding. The reader 
interested in such details should consult the literature pDEnunEm. 

A TALx86 object file consists of IA32 assembly-language instructions and 
data. As in a conventional assembly language, the instructions and data are 
organized into labeled sequences. Unlike conventional assembly language, some 
labels are equipped with a type annotation. The type annotations on the labels 
of instruction sequences, called code types, specify a pre-condition that must 
be satisfied before control may be transferred to the label. The pre-condition 
specifies, among other things, the types of registers and stack slots. For exam- 
ple, if the code type annotating a label L is {eax;int4, ebx;S(3), ecx: 

^ The other compilers are a simple stack-based compiler for Popcorn and a compiler 
for a core subset of Scheme. 
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* [int4, int4] }, then control may be transferred to the address L only when the 
register eax contains a 4-byte integer, the register ebx contains the integer value 
3, and the register ecx contains a pointer (") to a record (*[...]) of two 4-byte 
integers. 

Verification of code proceeds by taking each labeled instruction sequence and 
building a typing context that assumes registers have values with types as spec- 
ified by the pre-condition. Each instruction is then type-checked, in sequence, 
under the current set of context assumptions, possibly producing a new context. 
For most instructions, the verifier automatically infers a suitable typing post- 
condition in a style similar to dataflow analysis or strongest post-conditions. 
Some instructions require additional annotations to help the verifier. For exam- 
ple, it is sometimes necessary to explicitly coerce values to a supertype, or to 
explicitly instantiate polymorphic type variables. 

Not all labels require type annotations. However, code blocks without anno- 
tations may be checked multiple times under different contexts, depending on 
the control-flow paths of the program. To ensure termination of verification, the 
type-checker requires annotations on labels that are moved into a register, the 
stackfl or a data structure (such as a closure); on labels that are the targets of 
backwards branches (such as loop headers); and on labels that are exported from 
the object file (such as function entry points). These restrictions are sufficient 
for verification to terminate. We discuss labels without explicit types in more 
detail in Section Ol 

As in a conventional compiler, our certifying compiler translates the high- 
level control-flow constructs of Popcorn into suitable collections of labeled in- 
struction sequences and control transfers. For present purposes, control flow in 
Popcorn takes one of three forms: 

— an intra-procedural jump 

— a function call or return 

— an invocation of the current exception handler 

Currently, our compiler performs only intra-procedural optimizations, so the 
code types for function-entry labels are quite uniform and can be derived sys- 
tematically from the source-level function’s type. For simplicity, we discuss these 
code types first. We then discuss the code types for labels internal to functions, 
focusing on why they are more complicated than function entries. We emphasize 
that the distinction between the different flavors of code labels (function entries, 
internal labels, exception handlers) is a Popcorn convention encoded in the pre- 
conditions and is in no way specific to TALx86. Indeed, we have constructed 
other toy compilers that use radically different conventions. 

3.1 Function-Entry Labels 

As a running example, we consider a Popcorn function foo that takes one pa- 
rameter, an int, and returns an int. The Popcorn type int is compiled to the 

Return addresses are an important exception; they do not need explicit types. 
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TALx86 type int4. Arithmetic operations are allowed on values of this type; 
treating them as pointers is not. Our compiler uses the standard C calling con- 
vention for the IA32 architecture. Under this convention, the parameters are 
passed on the stack, the return address is shallowest on the stack, the return 
value is passed in register eax, and the caller pops the parameters upon return. 
All of these specifics are encoded in TALx86 by giving foo this pre-condition: 

foo: Vs:Ts. {esp: { eax: int4 

esp : int4 : : s} 

: : int4 
: :s} 

The pre-condition for foo concerns only esp (the stack pointer) and requires 
that this register point to a stack that contains a return address (which itself 
has a pre-condition), then an int4 (i.e. the parameter), and then some stack, 
s. The return address expects an int4 in register eax and the stack to have 
shape int4: : s. (The int4 is there because the caller pops the parameters.) The 
pre-condition is polymorphic over the “rest” of the stack as indicated by the 
universal quantification over the stack- type variable s. This technique allows a 
caller to abstract the current type of the stack upon entry, and it ensures that 
the type is preserved upon return. Types in TALx86 are classified into kinds 
(types of types), so that we do not confuse “standard” types such as int4 with 
“non-standard” types such as stack types. To maintain the distinction, we must 
label the bound type variable s with its kind (Ts). 

Notice that our annotation already includes much more information than it 
would need to if the safety policy dictated a calling convention. In that case, we 
would presumably just give the parameter types and return type of the function. 
Some systems, including the certifier for Special J 0j, go even further — they 
encode the types in the string for the label, so it appears that no annotation 
is necessary. Of course, the safety policy now attaches specific meaning to the 
characters in a label; the annotations are encoded in the assembly listing. 

Our annotation does not quite describe the standard C calling convention. 
In particular, the standard requires registers ebx, esi, and edi to be callee- 
save. (It also requires ebp, traditionally the frame pointer, to be callee-save. Our 
compiler uses ebp for the exception handler.) We encode callee-save registers 
using polymorphism^ 

foo: Vs:Ts al:T4 a2:T4 a3:T4 . 

{esp: |eax:int4 esp: int4::s ebx:al esi:a2 edi : a3 | 

: : int4 : : s 

ebx:al esi:a2 edi:a3 | 

This pre-condition indicates that for any standard types al, a2, a30 the 
appropriate registers must have those types before foo is called and again when 

^ Here and below, underlining is only for emphasis. 

The kind T4 includes all types whose values fit in a register. 
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the return address is invoked. This annotation restricts the behavior of foo to 
preserve these registers because it does not know of any other values with these 
types. More formally, this fact follows from parametricity IM . Notice that if 
we wish to use different conventions about which registers should be callee-save, 
then we need to change only the pre-condition on foo. In particular, we do not 
need to change the underlying type system of TALx86. 

Much more detail is required to encode our compiler’s translation of exception 
handling H2|, so we just sketch the main ideas. We reserve register ebp to point 
into the middle of the stack where a pointer to the current exception handler 
resides. This handler expects an exception packet in register eax. Because foo 
might need to raise an exception, its pre-condition must encode this strategy. 
Also, it must encode that if foo returns normally, the exception handler is still in 
ebp. We express all these details below, where 0 is an infix operator for appending 
two stack types. 

foo: Vsl:Ts s2:Ts al:T4 a2:T4 a3;T4 . 

{esp: {eax:int4 

esp: int4: :sl0{esp:s2 eax:exn}::s2 
ebp: {esp:s2 eax:exn}::s2 
ebx:al esi:a2 edi : a3} 

: : int4 : : sl0{esp : s2 eax : exn} : : s2} 
ebp: {esp:s2 eax:exn}::s2 
ebx:al esi:a2 edi:a3} 

We urge the reader not to focus on the details other than to notice that 
none of the additions are particular to foo, nor would it be appropriate for a 
safety policy to bake in this specific treatment of exception handlers. Also, we 
have assumed there is a type exn for exception packets. TALx86 does not provide 
this type directly, so our compiler must encode its own representation using an 
extensible sum |3|. Each of the four occurrences of exn above should in fact be 
replaced by the type 

3c : Tm [(~T"rw(c) * [int4"rw] ) "rw, c] 

but in the interest of type-setting, we spare the reader the result. 

For the sake of completeness, we offer a final amendment to make this pre- 
condition correct. Our compiler schedules function calls while some heap records 
may be partially initialized. This strategy is arguably better than the “don’t 
optimize” approach of always initializing records within a basic block, but it re- 
quires that we convince the verifier that no aliases to partially initialized records 
escape. In particular, the pre-condition for foo uses two capability variables 
as shown belowfl to indicate that it does not create any aliases to partially 
initialized records reachable from the caller or exception handler. 

® The constructor & [. . .] joins two capabilities to produce a harder-to-satisfy capa- 
bility; we omit its definition. 
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foo: Vsl:Ts s2:Ts el:Tcap e2:Tcap al:T4 a2:T4 a3:T4 . 

{esp: {eax:int4 

esp: int4: :sl@{esp:s2 eax:exn cap:e2}::s2 
ebp: {esp:s2 eax:exn cap:e2}::s2 
ebx;al esi:a2 edi : a3 
cap: &[el,e2]} 

: : int4 : : slQjesp : s2 eax : exn cap : e2} : : s2} 
ebp: {esp:s2 eax: exn cap:e2}::s2 
ebx:al esi:a2 edi:a3 
cap: &[el,e2]} 

In short, because our compiler has complicated inter-procedural invariants, 
the naive encoding into TALx86 is anything but concise. (The unconvinced reader 
is invited to encode a function that takes a function pointer as a parameter.) 
However, the only parts particular to our example function foo are the return 
type, which is written once, and the parameter types, which are written twice. 
Even these parts are the same for all functions that take and return integers. 

3.2 Internal Labels 

In this section, we present the pre-conditions for labels that are targets of intra- 
procedural jumps. For simplicity, we consider only functions that do not declare 
any local exception handlers. This special case is by far the most common, so it is 
worth considering explicitly. Because our compiler does perform intra-procedural 
optimizations, most relevantly register allocation, the pre-conditions for internal 
labels are less uniform than those for function-entry labels. Specifically, they 
must encode several properties about the program point that the label desig- 
nates: 

— A local variable may reside in a register or on the stack. 

— Some stack slots may not hold live values, so along different control-flow 
paths to the label, a stack slot may have values of different types. 

— Some callee-save values may reside on the stack while others remain in reg- 
isters. 

— Some heap records may be partially initialized. 

First we describe the relevant aspects of our term translation: Any callee-save 
values that cannot remain in registers are stored on the stack in the function 
prologue and restored into registers in the function epilogue. The space for this 
storage is just shallower than the return address. Local variables that do not 
fit in registers are stored in “spill slots” that are shallowest on the stack. The 
number of spill slots remains constant in the body of a function. This strategy 
is fairly normal, but it is far too specific to be dictated by TALx86. Indeed, our 
original Popcorn compiler did not perform register allocation; it simply pushed 
and popped variables on the stack as needed. 

The pre-condition for internal labels gives the type and location (register or 
spill slot) for each live local variable. If a stack slot is not live, we must still give it 
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some “place-holder” type so that the stack type describes a stack of the correct 
size. Different control-flow paths may use the same stack slot for temporary 
variables of different types. In these cases, no previously seen type can serve as 
this place-holder. TALx86 provides a primitive type top4 which is a supertype 
of all types ranging over word-sized values. We give this type to the dead stack 
slots at the control-flow join; the appropriate subtyping on control transfers is 
handled implicitly by the verifier 0 

In addition to live variables, all of the invariants involving the stack, the 
exception handler, etc. must be preserved as control flows through labels, so this 
information looks much as it does for function-entry labels. 

For example, suppose our function foo uses all of the callee-save registers 
and needs three spill slots. Furthermore, suppose that at an internal label, 1, 
there are two live variables, both of type int4, one in register esi and one in 
the middle spill slot. Then a correct pre-condition for 1 is: 

1: Vsl:Ts s2:Ts el:Tcap e2:Tcap al:T4 a2:T4 a3:T4. 

{esp: 

top4 : : int4 : : top4 : : a3 : : a2 : : al 
: : {eax: int4 

esp: int4 : : sl0{esp : s2 eax:exn cap:e2}::s2 
ebp: {esp:s2 eax:exn cap:e2}::s2 
ebx:al esi:a2 edi : a3 
cap: &[el,e2]} 

: : int4: : slQjesp : s2 eax:exn cap:e2}::s2} 
ebp: {esp:s2 eax:exn cap:e2}::s2 
cap: &[el,e2] 
esi: int4 | 

Our register allocator tries not to use callee-save registers so that functions 
do not have to save and restore them. For example, suppose registers esi and 
edi are not used in a function. Then internal labels will encode that a value of 
type al is on the stack in the appropriate place, esi contains a value of type a2, 
and edi contains a value of type a3. 

If one or more records were partially initialized on entry to 1, then the pre- 
condition would have a more complicated capability; we omit the details. What 
should be clear at this point is that the type annotations for internal labels are 
considerably less uniform than function-entry annotations. 

4 Recovering Conciseness and Efficiency 

Continuing the examples from the previous section, we describe three techniques 
for reducing the size of annotations. We then discuss techniques, most notably 
hash-consing, that can reduce the space and time required during verification. 
The next section quantifies the effectiveness of these and other techniques. 



It is theoretically possible to use polymorphism instead of a supertype, but in practice 
we found doing so very unwieldy. 
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4.1 Sharing Common Subterms 

Because the annotations repeat information, we can greatly reduce their total 
size by replacing identical terms with a pointer to a shared term. As an example, 
consider again the pre-condition for the function foo, which takes and returns 
an int: 

type exn = 3c:Tm [(~T"rw(c) * [int4~rw] ) ~rw, c] 
foo: Vsl:Ts s2:Ts el:Tcap e2:Tcap al:T4 a2:T4 a3:T4 . 

{esp: {eax:int4 

esp: int4: : slOjesp : s2 eax:exn cap:e2}::s2 
ebp: {esp:s2 eax:exn cap:e2}::s2 
ebx:al esi:a2 edi : a3 
cap: &[el,e2]} 

: : int4 : : slOjesp : s2 eax : exn cap : e2} : : s2} 
ebp: {esp:s2 eax: exn cap:e2}::s2 
ebx:al esi:a2 edi:a3 
cap: &[el,e2]} 

Removing some common subterms by hand, we can represent the same in- 
formation with the following pseudo-annotation: 

1 = 3c:Tm [(~T~rw(c) * [int4"rw] ) "rw, c] 

2 = &[el,e2] 

3 = {esp:s2 eax: Q cap:e2}::s2 

4 = int4 : : sl@ 

5 = {eax:int4 esp: 4 ebp: 3 

ebx:al esi:a2 edi : a3 cap: 2 }:: 4 

foo: Vsl:Ts s2:Ts el:Tcap e2:Tcap al:T4 a2:T4 a3:T4 . 

{esp: 5 ebp: 3 ebx:al esi:a2 edi : a3 cap: 2 } 

Other pre-conditions can share subterms with this one. For example, the 
pre-condition for 1 from the previous section can be rewritten as: 

1: Vsl:Ts s2:Ts el:Tcap e2:Tcap al:T4 a2:T4 a3:T4. 

{esp: top4: : int4 : : top4 : : a3 : : a2 : : al : : 5 
ebp: 3 cap: 2 esi:int4} 

Despite exploiting significant sharing, this example illustrates some limita- 
tions of sharing common subterms. First, we would like to share all the occur- 
rences of “si :Ts s2:Ts ... a3:T4”, but whether or not we can do so depends 
on the abstract syntax of the language. Second, pre-conditions for functions with 
different parameter types or return types cannot exploit subterms 4 or 5. An- 
other possible shortcoming not demonstrated is that alpha-equivalent terms may 
not appear to be the same. In practice, compilers can re-use variable names for 
compiler-introduced variables, so detecting alpha-equivalence for the purpose of 
sharing is not so important. 
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4.2 Parameterized Abbreviations 

TALx86 provides user-defined {i.e. compiler-defined) higher-order type construc- 
tors. These functions from types to types have several uses. For example, they are 
necessary to encode source-level type constructors, such as array, list, or object 
types. Here we show how to use higher-order type constructors to define param- 
eterized abbreviations. These abbreviations can exploit sharing among different 
types that sharing common subterms cannot. However, our verifier is unable to 
exploit such abbreviations during verification for reasons we explain below. 

Because every function-entry pre-condition that our compiler creates is the 
same except for its parameter types and return type, we can create a parame- 
terized abbreviation that describes the generic situation. Then at each function- 
entry label, we apply the abbreviation to the appropriate types. 

type F = fn params:Ts ret:T4. 

Vsl:Ts s2:Ts el:Tcap e2:Tcap al:T4 a2:T4 a3:T4. 

{esp: {eax: ret 

esp; parEmis(§sl0{esp : s2 eaxiexn cap:e2}::s2 
ebp; {esp:s2 eax:exii cap:e2}::s2 
ebx;al esi:a2 edi ; a3 
cap; &[el,e2]} 

: :params@sl@{esp:s eax:exn cap:e2}::s2} 
ebp: {esp:s2 eax:exn cap:e2};:s2 
ebx:al esi:a2 edi:a3 
cap: &[el,e2]} 

f 00 : F int4::se int4 

The only new feature other than the abbreviation is the type se which de- 
scribes empty stacks. We use it here to terminate a list of parameter types. The 
use of abbreviations greatly simplifies the structure of the compiler because it 
centralizes invariants such as calling conventions. 

It is not clear how a compiler-independent verifier could exploit an abbrevi- 
ation like F during verification. Suppose the first instruction in block foo incre- 
ments the input parameter. The verifier must check that given the pre-condition 
F int4; : se int4, it is safe to perform an increment of the value on top of the 
stack. This verification requires inspecting the result of the abbreviation appli- 
cation — the verifier does not know that the argument int4 : : se describes the 
top of the stack. As we show in Section 0 using abbreviations sometimes slows 
down verification because of this phenomenon. 

The abbreviation F is widely useful because all function-entry pre-conditions 
are similar. To use abbreviations for internal labels, we must capture the addi- 
tional properties that distinguish these pre-conditions. In addition to F’s param- 
eters, we also need parameters for the spill slots, the live registers, and something 
to do with partial-initialization issues. We also use a primitive type constructor 
(&) for combining two pre-conditions. That way we can pass in the live regis- 
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ters as one pre-condition and merge it with a pre-condition that describes the 
reserved registers. 

type L = 

fn parEmis:Ts ret:T4 spills :Ts part: leap regs:Tpre. 

Vsl:Ts s2:Ts el:Tcap e2:Tcap al:T4 a2:T4 a3:T4. 

{esp: 

spillsOaS : : a2 : : al 
: : {eax: ret 

esp: params@sl@{esp : s2 eax:exn cap:e2}::s2 
ebp: {esp:s2 eax:exn cap:e2}::s2 
ebx:al esi:a2 edi : a3 
cap: &[el,e2]} 

: :parEmis(§sl@{esp:s2 eax:exn cap:e2}::s2} 
ebp: {esp:s2 eax:exn cap:e2}::s2 
cap: & [part , el ,e2] } 

& regs 



1: L int4::se int4 

top4: :int4: :top4: :se ce {esi:int4} 

L is correct, but it is useful only for labels in functions where all three callee- 
save values are stored on the stack. With a “don’t optimize” approach, we could 
make all functions meet this description, but we lose most of the advantages of 
callee-save registers as a result. A better approach is to provide 2^ = 8 different 
abbreviations, one for each combination of callee-save values being stored on the 
stack. In fact, we need only 4 such abbreviations because our register allocator 
uses the callee-save registers in a fixed order. Because the compiler provides the 
abbreviations, this specialization is possible and appropriate. 

Rather than require the compiler-writer to write and use higher-order ab- 
breviations, one might hope to write a tool that took a collection of TALx86 
types and re-wrote them in terms of some automatically generated abbrevi- 
ations. Creating an optimal result appears at least as difficult as finding the 
shortest simply-typed lambda calculus term equivalent to a given one. We have 
not investigated using heuristics to discover useful abbreviations. 

4.3 Eliding Pre-conditions 

Recall that the verifier checks a code block by assuming its pre-condition is 
true and then processing each instruction in turn, checking it for safety and 
computing a pre-condition for the remainder of the block. At a control transfer 
to another block, it suffices to ensure that the current pre-condition implies the 
pre-condition on the destination label. 

TALx86 uses a reconstruction approach by allowing many label pre-conditions 
to be elided. Clearly, the result of eliding a pre-condition is a direct decrease 
in annotation size. To check a control transfer to a block with an elided pre- 
condition, the verifier simply uses the current pre-condition at the source of the 
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transfer to check the target block. Hence, if a block with elided pre-condition has 
multiple control-flow predecessors, it is verified multiple times under (possibly) 
different pre-conditions. 

To ensure that the verifier terminates, we prohibit annotation-free loops in 
the control-flow graph. For this reason, TALx86 allows a pre-condition to be 
elided only if the block is only the target of forward jumps. Even with this 
restriction, the number of times a block is checked is the number of paths through 
the control-flow graph to the block such that no block on the path has an explicit 
pre-condition. This number can be exponential in the number of code blocks, so 
it is unwise to elide explicit pre-conditions indiscriminantly. As the next section 
demonstrates, an exponential number of paths is rare, but it does occur and it 
can have a disastrous effect on verification time. 

The approach our compiler takes is to set an elision threshold, T, and insist 
that no code block is verified more than T times. Notice T — 1 means all merge 
points have explicit pre-conditions. We interpret T — 0 to mean that all code 
labels, even those with a single predecessor, have explicit pre-conditions. For 
higher values of T, we expect space requirements to decrease, but verification 
time to increase. Given a value for T, we might like to minimize the number 
of labels that have explicit pre-conditions. Unfortunately, we have proven that 
this problem is NP-Complete for T > 3. (We do not know the tractability when 
T = 2.) Currently, the compiler does a greedy depth- first traversal of the control- 
flow graph, leaving off pre-conditions until the threshold demands otherwise. In 
pathological cases, this heuristic can do arbitrarily poorly, but it seems to do 
well in practice. 

Using an elision threshold is actually over-constraining the problem — it is 
more important to minimize the total number of times that we verify blocks. 
That is, we would prefer to verify some block more than T times in order to 
verify several other blocks many fewer times. For structured programs (all intra- 
procedural jumps are for loops and conditionals), it appears that this relaxed 
problem can be solved in polynomial time (0(n®) where n is the number of 
blocks but the algorithm seems impractical. 

4.4 Hash-Consing and Fast Type Operations 

So far, we have discussed techniques for reducing the size of the annotations that 
the code producer writes. For the verifier, these explicit types provide guidance 
to check that each assembly instruction is safe. To do this checking, the verifier 
determines the type of the context (i.e., the registers and the stack) before the 
instruction, the types of the operands, and the type of the context after the 
instruction. The operands must be subtypes of the types that the instruction 
requires. In short, the verifier itself creates many type expressions and often 
checks that one is a subtype of another. Therefore, it is important that these 
operations consume as little time and space as possible. 

Our primary technique for reducing space is hash-consing, which is essentially 
just the on-line form of sharing. As types are created, we first check a table to 
see if they have been created before. If so, we return a pointer to the table entry; 
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if not, we put the type in the table. As a result, types consume less space, but we 
incur the overhead of managing a table. It would be correct to return any alpha- 
equivalent type from the table, but in the interest of fast lookup operations, we 
find only a syntactically identical type. 

In the most general case, to decide if r is a subtype of t', we should convert 
both types to normal form and then do a structural subtyping comparison. One 
common case for which it is easy to optimize is when r and t' are the same 
object, that is, they are pointer-equal. With hash-consing, syntactically equal 
types should always be pointer-equal. Even when the two types are not the 
same object (for example, one is a strict subtype of the other), many parts of 
the two types may be pointer-equal, so we can usually avoid a full structural 
comparison. 

There are complications with pointer equality, however: We must consider 
alpha-equivalent types to be equal. To do so, we maintain a separate variable- 
substitution map rather than actually performing costly type substitutions. In 
the presence of a non-empty map, it is not necessarily correct that pointer-equal 
types are equal because the substitution has not been applied. Fortunately, our 
compiler uses the same type variables consistently, so the variable-substitution 
map is almost always empty. 

Hash-consing has another positive effect on verification time: When we reduce 
r to t' (for example, by applying an abbreviation), we do an in-place update of 
r. Hence, all pointers to a shared r will use the result of the single reduction. 
However, the original r will no longer appear to be in the hash-cons table. We 
could add a level of indirection to alleviate this shortcoming (keep r in the table 
for the purpose of future sharing and have it point to its reduced form r'), but 
our implementation does not currently do so. 

Another common operation on types is substitution, that is, substituting t' 
for a variable a in r. Operations that need substitution include applying abbre- 
viations and instantiating polymorphic types. We need to recursively substitute 
for a in all the constituent types within r, but we expect that most of them do 
not contain a. To optimize for this common case and avoid crawling over much 
of T, we memoize the free variables of each type and store this set with the type. 

As discussed in Section lt>.4l we might expect further benefits from using 
de Bruijn indices and performing type substitutions lazily. Unfortunately, this 
change was so pervasive that we chose not to investigate it in the experiments 
that we discuss in the next section. 

5 Experimental Results 

In this section, we present our quantitative study of certifying a real program 
in TALx86. We conclude that targeting compiler-independent safety policies is 
practical and scalable when appropriate techniques are used. 

Our example is the Popcorn compiler itself. The compiler consists of 39 Pop- 
corn source files compiled separately. The more interesting optimizations per- 
formed are Chaitin-style intra-procedural register allocation ^ (using optimistic 
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spilling 0 and conservative coalescing |^) and the elimination of fully redun- 
dant null-checks for object dereferences. The entire compiler is roughly 18,000 
lines of source code and compiles to 816 kilobytes of object code (335 kilobytes 
after running strip). 

The sizes we report include the sum across files of all annotations, not just 
those for code labels. They do not include the separate module-interface files that 
the TALx86 link-checker uses to ensure type-safe linking. All execution times were 
measured on a 266MHz Pentium II with 64MB of RAM running Windows NT 
4.0. The verifier and assembler are written in Objective Caml HS| and compiled 
to native code. 

We first show that naive choices in the annotation language and compiler 
can produce a system with unacceptable space and/or time overhead. Then 
we show that our actual implementation avoids these pitfalls. Next we adjust 
various parameters and disable various techniques to discover the usefulness of 
individual approaches and how they interact. Finally, we discuss how we could 
extend our techniques to further lower the TALx86 overhead. 

5.1 Two Bad Approaches 

A simple encoding of the TALx86 annotations is insufficient. First, consider a 
system where we do not use the abbreviations developed in Section 0 our type 
annotations repeat types rather than share them, and we put types on all code 
labels. Then the total annotation size for our program is over 4.5 megabytes, 
several times the size of the object code. As for verification time, if we make no 
attempt to share common subterms created during verification, then it takes 59 
seconds to verify all of the files. 

A second possibility is to remove as many pre-conditions as possible. That 
is, we put an explicit pre-condition on a code label only if the label is used as a 
call destination, a backwards-branch destination, or a first-class value. Indeed, 
the total size of our annotations drops to 1.85 megabytes. However, the verifier 
now checks some code blocks a very large number of times. Total verification 
time rises to 18 minutes and 30 seconds. 

These two coarse experiments yield some immediate conclusions. First, the 
actual amount of safety information describing a compiled program is large. 
Second, the number of loop-free paths through our application code is, in places, 
much larger than the size of the code. Therefore, it is unwise to make verification 
time proportional to the number of loop-free paths as the second approach does. 

The latter conclusion is important for certified-code frameworks that con- 
struct verification conditions at verification time via a form of weakest pre- 
condition computation. Essentially, such systems construct pre-conditions for 
loop-free code segments using a backward flow analysis. In an expressive sys- 
tem, the pre-condition at a backward merge-point could be the logical disjunc- 
tion of two conditions. Hence, if done naively, the constructed condition can have 
exponential size by having a different clause for every loop- free path. 

When the number of loop-free paths is large, it is clear that constructing an 
enormous pre-condition is wasteful. For a compiler to exploit the weakness of 
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such a pre-condition, it would need to have optimized based on an exponential 
amount of path-sensitive information. We conclude that constructing weakest 
pre-conditions in this way is impractical. Instead, annotations should guide the 
construction of the verification condition; the optional code pre-conditions of 
TALx86 fill this role. 

5.2 A Usable System 

Having shown how bad matters can get, we now present the actual overhead that 
our system achieves. First, we identify the main techniques used and the overhead 
that results. Then we show that verification time is roughly proportional to file 
size; this fact suggests that our approach should scale to larger applications. 
Finally, we partition the source code into several styles, show that the overhead 
is reasonable for all of them, and discuss salient differences. 

Unlike the “straw man” systems constructed above, the real encoding of 
TALx86 annotations uses several tables to share common occurrences. Specif- 
ically, uses of identifiers, types, kinds, and coercions are actually indices into 
tables that contain the annotations. The code producer can avoid duplicates 
when constructing the tables. The benefit of this approach is proportional to 
the amount of repetition; there is a small penalty for annotations that occur 
only once. We call this technique “sharing” ; more specifically it is full common- 
subexpression elimination on types at the file level. Sharing is just off-line hash- 
consing; we use the latter term to refer to sharing within the verifier for types 
created during verification. 

Sharing does not create parameterized abbreviations, so we also use the ab- 
breviations developed in Section 0 The compiler provides the abbreviations and 
uses them in a text version of TALx86. An independent tool converts the text 
version into a binary version that has sharing. In this sense, we use abbreviations 
“before” sharing. 

We set the elision threshold to four. At this value, many forward control- flow 
points will not need explicit pre-conditions, but no block is verified more than 
four times. 

Finally, the verifier uses hash-consing to share types that are created during 
verification. That is, when creating a new type, the verifier consults a table to 
see if it has encountered the type previously. If so, it uses the type in the ta- 
ble. Because the entire sharing table is parsed prior to verification, any types 
in the table will be used rather than repeated. Reductions on higher-order type 
constructors are performed in a lazy manner. In particular, we use a weak-head 
normalization strategy with memoization to avoid both unnecessary reductions 
and duplicated reductions. As such, other uses of the type will not have to recom- 
pute the reduction. Shao and associates use a similar strategy m- Because of 
complications with the scope of abbreviations, the hash-consing table is emptied 
before verifying each file. If memory becomes scarce, we could empty the table 
at any point, but this measure has not been necessary in practice. Note that the 
use of hash-consing cannot affect the size of explicit annotations; hash-consing 
attempts to share types created during verification. 
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With this system, total annotation size drops from 4.5 megabytes to 419 
kilobytes and verification time drops from 59 to 34.5 seconds. As for compila- 
tion time, our compiler takes 40 seconds to compile the Popcorn source files 
into ASCII TALx86 files, which are essentially Microsoft Assembler (MASM) 
files augmented with annotations. A separate tool takes 23 seconds to assemble 
all of these files; this time includes the creation of the binary encoding of the 
annotations with sharing. As we add more optimizations to our compiler, we 
expect compilation time to increase more than verification time. The latter may 
actually decrease as object-code size decreases. 

Performing gzip compression on the 419 kilobytes of annotations reduces 
their size to 163 kilobytes. The ratio of compression is similar to that for our 
object files; the unstripped files compress from 816 to 252 kilobytes and the 
stripped files compress from 335 to 102 kilobytes. 

A desirable property is that verification time is generally proportional to file 
size. Without eliding pre-conditions, the time to verify TALx86 code is propor- 
tional to the size of the code plus the size of the normalized types used as anno- 
tations plus the time to look up types in the context. However, with higher-order 
abbreviations, normalizing types could, in theory, take non-element ary time H2|. 
We are pleased to see that such inefficiency has not occurred in practice: Figure E 
plots verification time against total size (object code plus annotations) for all of 
the files in the compiler. The time stays roughly proportional as file size grows 
by over an order of magnitude. Small files take proportionally longer to verify 
because of start-up costs and the overhead of using hash-consing. Such files take 
just a fraction of a second to verify, so we consider these costs insignificant. 

So far we have presented results for the entire compiler as a whole. By an- 
alyzing the results for different styles of code, we can gain additional insight. 
Of course, all of the code is in the same source language, compiled by the same 
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compiler, and written by the authors. Nonetheless, we can partition the files into 
several broad categories: 

— Polymorphic libraries: These files provide generally useful utilities such as 
iterators over generic container types. Examples include files for lists, dictio- 
naries, sets, and resizing arrays. 

— Monomorphic libraries: Examples include files for bit vectors and command- 
line arguments. 

— Mostly Type Definitions: These files primarily define types used by the com- 
piler and provide only simple code to create or destruct instances of the type. 
Examples include files for the abstract syntax of Popcorn, the compiler’s in- 
termediate language, an abstract syntax for TALx86, and an environment 
maintained while translating from the intermediate language to TALx86. 

— Machine generated: These files include the scanner and the parser. Compared 
to other styles of code, they are characterized by a small number of large 
functions that contain switch statements with many cases. They also have 
large constant arrays. 

— Compilation: These files actually do the compilation. Examples include files 
for type checking, register allocation, and printing the output. 

Figure 13 summarizes the annotation size and verification time relative to the 
categorization Q The “Size Ratio” is annotation size divided by the object code 
size (smaller is better). The “Time Ratio” is the sum of the two sizes divided by 
the verification time (larger is better). 

Most importantly, all of the size ratios are well within a factor of two and the 
time ratios are even closer to each other. We conclude that no particular style of 
code we have written dominates the overhead of producing provably safe object 
code. Even so, the results differ enough to make some interesting distinctions. 

The files with mostly type definitions have the largest (worst) size ratio and 
largest (best) time ratio. The former is because type definitions are compiled 
into annotations that describe the corresponding TALx86 types, but there is 
no associated object code. The size ratio can actually be arbitrarily high as the 

^ The sum of the verification times is slightly less than the time to verify all the files 
together due to secondary effects. 



Scalable Certification for Typed Assembly Language 139 





Annotation Size (kB) 


Verification Time (sec) 


Sharing 


Abbreviations 


Uncompressed 


Compressed 


No hash-consing 


Hash-consing 


no 


no 


2041 


155 


50 


38 


no 


yes 


793 


132 


42 


36 


yes 


no 


503 


205 


37.5 


34.5 


yes 


yes 


419 


163 


40.5 


34.5 



Fig. 3. Effect of Abbreviations, Sharing Subterms, and Hash-Consing 



amount of code in a source file goes to zero. The time ratio is also not surprising; 
the time-consuming part of verification is checking that each instruction is safe 
given its context. 

The relatively high size ratio for machine-generated code is an artifact of how 
parsers are generated. Essentially, all of the different token types are put into a 
large union. The code that processes tokens is therefore filled with annotations 
that coerce values into and out of this union. 

The size ratio for polymorphic libraries is slightly larger than we expected. 
A source-level function that is polymorphic over some types needs to explicitly 
name those types only once. Because TALx86 has no notion of function, all of the 
labels for such a function must enumerate their type variables^ Furthermore, 
control transfers between these labels must explicitly instantiate the additional 
type variables. 

Finally, the time ratio is noticeably worse for the compilation code. This style 
of code contains a much higher proportion of function calls than libraries, which 
mostly contain leaf procedures. Because of the complicated type instantiations 
that occur at a call site, call instructions take the most time to verify. 



5.3 Effectiveness of Individual Techniques 

We have shown that our system achieves reasonable performance and uses a 
number of techniques for controlling annotation overhead, but we have not yet 
discussed which of the techniques are effective. In this section, we examine what 
happens if we selectively disable some of these techniques. 

Figure 0 summarizes the total annotation size when the elision threshold 
is four and the other techniques are used selectively. When “Sharing” is no, 
we do not use tables for sharing types and coercions. Instead, we repeat the 
types directly in the annotations. We still share identifiers so that the lengths of 
strings is insignificant. If “Abbreviations” is no, then all abbreviations are fully 
expanded before the annotations are written. “Uncompressed” is the total size 
of all the annotations. “Compressed” is the sum of the result of running gzip on 
each file’s annotations separately. The final two columns give total verification 
time with and without hash-consing enabled. 

® Pre-conditions can still be elided, fortunately. 
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We first discuss the effect of sharing and abbreviations on the explicit anno- 
tation size. Both techniques appear very effective if we ignore the effect of gzip. 
Abbreviations alone reduce size by a factor of 2.57 whereas sharing alone reduces 
size by a factor of 4.06. Using abbreviations and sharing reduces size by another 
seventeen percent as compared to a system with just sharing. Hence neither 
technique subsumes the other, but they recover much of the same repetition. 

However, if what we really care about is the size of annotations that must 
be sent to a code consumer, then we should consider running gzip. It is clear 
that gzip is extremely effective; our worst result for compressed annotations 
is a factor of two better than our best result for uncompressed annotations. 
More subtle is the fact that gzip achieves a smaller result when sharing is not 
used in our binary encoding. This result, which surprised us, is a product of 
how our tables are implemented and how gzip performs compression. In short, 
gzip constructs its own tables and uses a much more compact format than our 
encoding. Worse, our tables hide repetition from gzip, which looks for common 
strings. We conclude that if annotation size is the primary concern, then the 
binary encoding should remain “gz ip-friendly” . 

Abbreviations are actually much more effective than the data in the figure 
suggests. The compiler’s abbreviations are used only for code pre-conditions, 
so optimizing this one aspect of annotation size must eventually demonstrate 
Amdahl’s LawH We considered what the total annotation size would be if we 
removed all explicit code pre-conditions. Of course, the result of this drastic 
measure is unverifiable, but it provides a rough lower bound for the effectiveness 
of the abbreviations. The total size is still 377 kilobytes, so abbreviations reduced 
the size of code pre-conditions by about a factor of four ((2041—377) / (793—377)). 

We now discuss the effect of the techniques on verification time. Here gzip is 
useless because our verifier works on uncompressed annotations. Without hash- 
consing, sharing significantly reduces verification time. While the verifier under 
these conditions does not share types that it creates during verification, it does 
share types that originally occur in the annotations. The result suggests that 
these types cover many of those used during verification^ Without sharing, ab- 
breviations are a great help because they recover the most common occurrences. 
However, with sharing, abbreviations actually hurt verification time. The time 
to expand the abbreviations during verification outweighs the time that the ad- 
ditional sharing gains. 

With hash-consing, the different verification times are much closer to each 
other. Using a hash-consing table rediscovers any sharing, so without sharing 
initially we have to pay only the cost to achieve this rediscovery. More interest- 
ingly, the penalty for abbreviations disappears. We believe this result is due to 
the fact that with hash-consing, any abbreviation applied to the same argument 
is expanded only once and then the result is used in multiple places. 

® Actually, there are a few other places where the abbreviations are used, such as when 
a polymorphic function is instantiated at a function type, but such situations are 
rare in our code. 

Parsing time is a small but noticeable fraction of the difference. 
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Hash-consing reduces verification time significantly, but only with a careful 
implementation of the hashing. For example, if we give our hash-cons table a size 
near a power of two (as number theory warns against), verification time takes 
longer than without hash-consing. The good news is that optimizing the verifier 
can sometimes be reduced to fundamental properties of data structures. The bad 
news is the difference between verification times under different parameters is 
more brittle than we would like. 

One reason hash-consing improves verification time is that types occupy less 
space, so we expect better cache performance and fewer garbage collections. 
Another reason is that the verifier’s function for determining if one type is a 
subtype of another returns immediately when two types are pointer-equal. This 
function is called about 170,000 times when verifying our compiler. Without 
hash-consing (but with sharing and abbreviations), 45,000 of the calls are with 
pointer-equal arguments. With hash-consing, the figure rises to 82,000. Even 
when the entire types are not pointer-equal, we can avoid much of the structural 
comparison when parts of them are pointer-equal. Without hash-consing, we 
make about 1,400,000 recursive calls. With hash-consing, the number of recursive 
calls drops to 730,000. 

As explained in the previous section, TALx86 code blocks that are targets of 
only forward branches do not need annotations, but they will be reverified along 
every unannotated control-flow path. Given an elision threshold T, our compiler 
ensures that no block will be verified more than T times. Subject to this con- 
straint, it uses a simple greedy algorithm to leave annotations off labels. Figure 0 
shows the effect of changing the value of T. We use sharing and abbreviations. 

The top chart in Figure 0 shows that total annotation size drops by over 
fifteen percent as T is 1 instead of 0. We conclude that low-level systems should 
not require pre-conditions on all blocks. However, the additional space savings 
as T takes values larger than 8 are quite small. This fact justifies the use of 
T = 4 for the other experiments. 

The bottom charts in Figure El show the verification time for different values 
of T. Verification time initially drops as T gets the value 1 instead of 0. This 
phenomenon indicates that it takes a lot of time to process an explicit annotation 
and compare it to a pre-condition. As T takes values 2, 4, 8, and 16, verification 
time rises noticeably but only by a few seconds. We conclude that this range 
of values allows for reasonable time-space tradeoffs. As T takes larger values, 
verification time rises sharply. Although very few additional blocks have their 
pre-conditions elided, these blocks are then checked a very large number of times. 
In fact, for large T, the time spent verifying different files varies drastically 
because most files do not have any such blocks. (A value of infinity for T means 
we put explicit annotations only where the verifier requires them.) 



5.4 Useful Extensions 

We have presented a system where uncompressed safety annotations consume 
roughly half the space of the object code they describe, and we have given 
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Fig. 4. Effect of Elision Threshold 



techniques (sharing, abbreviations, and elision) that help in this regard. Now 
we investigate whether the current system is the best we can hope to achieve 
or if the techniques could contribute more to reducing the TALx86 overhead. By 
moving beyond what the current system supports, we demonstrate the latter. 

First, notice that sharing common subterms is so effective because we share 
annotations across an entire file. The file level is currently the best we can do 
because we compile files separately. In a scenario where all of the object files are 
packaged together, we could share annotations in a single table for the entire 
package. Although our current tools cannot process such a package, we are able 
to generate it and measure its size. The total size drops from 419 kilobytes to 338 
kilobytes. We conclude that different files in our project have many similar an- 
notations; we should be able to exploit this property to further reduce overhead. 
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This improvement does not rely on understanding the compiler’s conventions, so 
a generic TALx86 tool could put separately compiled object files into a package. 

Second, the annotations that describe what coercions apply at each instruc- 
tion are not currently shared. Although there are many common occurrences, 
some of them take only one byte to represent, so sharing these annotations must 
carefully avoid increasing space requirements. 

Third, verification time suffers significantly from memory allocation and 
garbage collection. Although we have implemented hash-consing to address this 
bottleneck, Shao and associates m use their experience building type-directed 
compilers to suggest that suspension-based lambda encoding PH (essentially de 
Bruijn indices and lazy substitution) can further improve performance. We rele- 
gate to future work modifying the verifier to experiment with these techniques. 

Fourth, some well-chosen uses of type reconstruction could eliminate many 
of the explicit annotations. For example, if the verifier performed unification 
of (first-order) type variables, then the compiler could eliminate all of the type 
applications at control transfer points. This elision would improve our annota- 
tion size to 330 kilobytes. (To compute this figure, we elided the instantiations 
even though the verifier cannot process the result.) Reconstruction approaches 
improve the size even in the presence of gzip; the compressed annotations drop 
from 163 kilobytes to 141 kilobytes in this case. 

In summary, the TALx86 system shows that techniques such as sharing and 
elision make certified code scalable and practical, but even TALx86 could use 
these techniques more aggressively to achieve lower overhead. 

6 Conclusions 

Our Popcorn compiler encodes the safety of its output in TALx86. As a Popcorn 
application itself, it also serves as the largest application we know of that has 
been compiled to a safe machine language. Because we believe safety policies 
should not be tailored to a particular compiler, we encode the aspects of Pop- 
corn compilation relevant to safety in the more primitive constructs of TALx86. 
We have found that the most important factor in the scalability of certifying 
compilation is the size of code pre-conditions. 

Based on our experience, we present the following conclusions for compiler- 
independent certification systems. 

— Common-subexpression elimination of explicit annotations is a practical ne- 
cessity. Sharing terms created during verification is also helpful, but it is 
important to carefully manage the overhead inherent in doing so. 

— Compilers can effectively exploit parameterized abbreviations to encode their 
invariants. Although abbreviations improve the size of explicit annotations, 
it is more difficult to exploit abbreviations during verification. 

— Serial compression utilities, such as gzip, are very helpful, but they are not 
a complete substitute for other techniques. Moreover, if good compression 
is a system requirement, one should understand the compression algorithm 
when designing the uncompressed format. 
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— Overhead should never be proportional to the number of loop-free control- 
flow paths in a program. 

We believe these suggestions will help other projects avoid common pitfalls 
and focus on the important factors for achieving expressiveness and scalability. 

Acknowledgments. The TALx86 infrastructure is a product of the TAL re- 
search group. Fred Smith contributed greatly to the prototype Popcorn com- 
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complete for an elision threshold greater than five; David Kempe proved the 
other complexity results. The anonymous reviewers provided many helpful com- 
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Abstract. We present the design and implementation of the first com- 
plete framework for flexible and safe dynamic linking of native code. 
Our approach extends Typed Assembly Language with a primitive for 
loading and typechecking code, which is flexible enough to support a 
variety of linking strategies, but simple enough that it does not signif- 
icantly expand the trusted computing base. Using this primitive, along 
with the ability to compute with types, we show that we can program 
many existing dynamic linking approaches. As a concrete demonstra- 
tion, we have used our framework to implement dynamic linking for a 
type-safe dialect of C, closely modeled after the standard linking facility 
for Unix C programs. Aside from the unavoidable cost of verification, 
our implementation performs comparably with the standard, untyped 
approach. 



1 Introduction 

A principle requirement in many modern software systems is dynamic extensibi- 
lity — the ability to augment a running system with new code without shutting 
the system down. Equally important, especially when extensions may be un- 
trusted, is the condition that extension code be safe: an extension should not 
be able to compromise the integrity of the running system. Two examples of 
systems allowing untrusted extensions are extensible operating systems 0, m 
and applet-based web browsers izg. Extensible systems that lack safety typi- 
cally suffer from a lack of robustness; for example, if the interface of a newer 
version of a dynamically linked library (DLL) changes from what is expected 
by the loading program, its functions will be called incorrectly, very possibly 
leading to a crash. These sorts of crashes are accidental, so in the arena of un- 
trusted extensions the problem is greatly magnified, since malicious extensions 
may intentionally violate safety. 

R. Harper (Ed.): TIC 2000, LNCS 2071, pp. 147- 117^ 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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The advent of Java and its virtual machine m (the JVM) has popularized 
the use of language-based technology to ensure the safety of dynamic extensions. 
The JVM bytecode format for extension code is such that the system may verify 
that extensions satisfy certain safety constraints before it runs them. To boost 
performance, most recent JVM implementations use just-in-time (JIT) compil- 
ers. However, because JIT compilers are large pieces of software (typically tens 
of thousands of lines of code), they unduly expand the trusted computing base 
(TCB), the system software that is required to work properly if safety is to be 
assured. To minimize the likelihood of a security hole, a primary goal of all such 
systems is to have a small TCB. 

An alternative approach to verifiable bytecode is verifiable native code, first 
proposed by Necula and Lee |ES] with Proof-Carrying Code (PCC). In PCC, 
code may be heavily optimized, and yet still verified for safety, yielding good 
performance. Furthermore, the TCB is substantially smaller than in the JVM: 
only the verifier and the security policy are trusted, not the compiler. A variety 
of similar architectures have been proposed P], |2S|, IHHj' 

While verifiable native code systems are fairly mature, all lack a well-designed 
methodology for dynamic linking, the mechanism used to achieve extensibility. 
In the PCC Touchstone system, for example, dynamic linking has only been 
performed in an ad-hoc manner, entirely within the TCB pni, and the current 
Java to PCC compiler. Special J, does not support dynamic linking p. Most 
general-purpose languages support dynamic linking 0, 0, eg, E3, EH, EH , so 
if we are to compile such languages to PCC, then it must provide some support 
for implementing dynamic linking. We believe this support should meet three 
important criteria: 

1. Security. It should only minimally expand the TCB, improving confidence 
in the system’s security. Furthermore, soundness should be proved within a 
formal model. 

2. Flexibility. We should be able to compile typical source language linking 
entities, e.g., Java classes, ML modules, or C object files; and their loading 
and linking operations. 

3. Efficiency. This compilation should result in efficient code, in terms of both 
space and time. 

In this paper, we present the design and implementation of the first complete 
framework for dynamic linking of verifiable native code. We have developed this 
framework in the context of Typed Assembly Language Eg (TAL), a system of 
typing annotations for machine code, similar to PCC, that may be used to verify 
a wide class of safety properties. Our framework consists of several small addi- 
tions to TAL that enable us to program dynamic linking facilities in a type-safe 
manner, rather than including them as a monolithic addition to the TCB. Our 
additions are simple enough that a formal proof of soundness is straightforward. 
The interested reader is referred to the companion technical report m for the 
full formal framework and soundness proof. 

To demonstrate the flexibility and efficiency of our framework, we have used 
it to program a type-safe implementation of DLopen 0, a UNIX library that 
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provides dynamic linking services to C programs. Our version of DLopen has 
performance comparable to the standard ELF implementation EDI, and has the 
added benefit of safety. Furthermore, we can program many other dynamic link- 
ing approaches within our framework, including Java classloaders Windows 
DLLs and COM (7), Objective Caml’s Dynlink |27|, ^7], Flatt and Felleisen’s 
Units and spin’s domains |^, among others. 

The remainder of this paper is organized as follows. In the next section we 
motivate and present our framework, which we call TAL/Load. In Section 0 
we describe a type-safe version of DLopen programmed using TAL/Load. In 
Section 0] we compare the performance of our type-safe version to the standard 
version of DLopen. We discuss how we can program other linking approaches 
using TAL/Load in Section 0 and discuss other related work. We conclude in 
Section El 

2 Our Approach 

We begin our discussion by considering a straightforward but flawed means of 
adding dynamic linking in TAL, to motivate our actual approach, described later. 
Consider defining a primitive, loado, that dynamically instantiates, verifies, and 
links TAL modules into the running program. Informally, loado might have the 
type: 

loado : Va : sig. bytearray — ^ a option 

To dynamically load a module, the application first obtains the binary represen- 
tation of the module as a bytearray, and provides it to loado preceded by the 
module’s expected signature type a. Then loado parses the bytearray, checks it 
for well-formedness, and links any unresolved references in the file to their defi- 
nitions in the running program. Next, it compares the module’s signature with 
the expected one; if the signatures match, it returns the module to the caller. 
If any part of this process fails, loado returns NONE to signal an error. As an 
example, suppose the file “extension” contains code believed to implement a 
module containing a single function / of type int — ^ int. In informal notation, 
that file is dynamically linked as follows: 

case loado [sig / : int -> int end] 

(read_file "extension") of 
NONE => ... handle error . . . 

I SOME m => m.f(12) 

There are many problems with this approach. First, it requires first-class 
modules; in the context of a rich type system, first-class modules require a com- 
plicated formalization {e.g., Lillibridge m) with restrictions on expressiveness; 
as a result, in most ML variants (and TAL as well) modules are second-class [T7j . 
m,m- Second, it requires a type-passing semantics as the type passed to loado 
must be checked against the actual type of the module at run-time. This kind of 
semantics provides implicit type information to polymorphic functions, contrary 
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to the efforts of TAL to make all computation explicit. Third, all linking opera- 
tions, including tracking and managing the exported definitions of the running 
program, and rewriting the unresolved references in the loaded file, occur within 
loado, and thus within the TCB. Finally, we are constrained to using the par- 
ticular linking approach defined within the TCB, diminishing flexibility. As we 
show in Sections 0 and linking is the aspect of extensibility that differs most 
among source languages. For example, Java links unresolved references incre- 
mentally, just before they are accessed, while in C all linking generally occurs at 
load-time. Furthermore, extensible systems typically require more fine-grained 
control over linking. For example, in SPIN only trusted extensions may link 
against certain secure interfaces, and in MMM 1221 , the runtime interface used 
during dynamic linking is a safe subset of the one used during static linking, a 
practice called module thinning. 

Rather than place all dynamic linking functionality within the TCB, as we 
have outlined above with loadp, we prefer to place smaller components therein, 
forming a dynamic linking framework. Furthermore, these components are them- 
selves largely composed of pre-existing TAL functionality. Therefore, this frame- 
work does not implement source-level dynamic linking approaches directly, but 
may be used to program them. 

Our framework defines a primitive load similar to loado above, but with the 
following simplifications: 

1. Loaded modules are required to be closed with respect to terms. That is, 
they are not allowed to reference any values defined outside of the module 
itself. We can compile source-language modules that allow externally-defined 
references to be loadable by using a “poor man’s functorization,” which we 
describe below. Modules may refer to externally-defined (he., imported) type 
definitions. 

2. Rather than return a first-class module, load returns a tuple containing the 
module’s exported term definitions (and thus the type variable a now is 
expected to be a tuple-type, rather than a signature). Any exported type 
definitions are added to the global program type interface, a list of types and 
their definitions used by the current program, used to resolve the imported 
type definitions of modules loaded later. 

3. Rather than require a type-passing semantics for the type argument to load, 
we make use of term-level representations of types, in the style of Crary et 
al. |B|. 

These simplifications serve three purposes. First, by eliminating possible type 
components from the value returned by load, we avoid a complicated modular 
theory, at a small cost to the flexibility of the system. Second, the majority of 
the functionality of load — parsing binary representations and typechecking — is 
already a part of the TCB. By avoiding term- level linking (since loaded modules 
must be closed) we can avoid adding binary rewriting and symbol management 
to the TCB (we do have to manage type definitions, however, as we explain 
in the next subsection). Finally, by adding term- level type representations, we 
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preserve TAL’s type-erasure semantics. These representations also allow the im- 
plementation of a dynamic type, making it possible to program linking facilities 
outside of the TCB. We call our framework TAL/Load. 

While TAL/Load only permits loading closed TAL modules, in practice we 
wish to dynamically load non-closed source modules by resolving their external 
references with definitions in the running program. One way to implement this 
linking strategy is by translating source-level external references into “holes” 
(i.e. uninitialized reference cells), in a manner similar to closure-converting a 
function. After the module is loaded via load, these cells are linked appropriately 
using a library added to the program. To track the running program’s symbols, 
we can use term-level type representations, existential types m and a special 
checked _cast operator to implement type dynamics PP, amenable to programming 
a type-safe symbol table. 

We defer a complete discussion of how to effectively use TAL/Load until 
Section El where we describe our implementation of a full-featured dynamic 
linking approach for C programs. For the remainder of this section, we focus on 
two things. First, we look more closely at the process of closing a module with 
respect to its externally defined types and terms. We explain the difficulty with 
closing a module with respect to named types, thus motivating our solution 
of using the program type interface. We then describe the implementation of 
TAL/Load in the TALx86 j22| implementation of TAL. 



2.1 Comparing Types by Name 

The complications with first-class structures arise because of their type com- 
ponents; if M and N are arbitrary expressions of module type having a type 
component t, it is difficult at compile-time to determine if M.t is equal to (is the 
same type as) N.t. The problem arises because we do not know the identities of 
types M.t and N.t, and therefore must use their names (including the paths) to 
compare them. 

In the absence of these named type^, closing a module with respect to its 
externally-defined terms is fairly simple. For example, consider the following 
SML module, perhaps forming part of an I/O library, that supports the opening 
and reading of text files. 

structure TextIO = 
struct 

type instreami = int 

val openin ; string -> instream = ... 

val inputLine : instream -> string = ... 

end 

^ Named types are also called branded types, and can be used to implement ab- 
stract types (as in first-class modules) and generative types (such as structs in C or 
datatypes in ML). 
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A client of this module might be something like: 
fun doit = 

let val h = TextIO . openin "myfile.txt" in 
TextIO . inputLine h 
end 

If we want to close this client code to make it amenable for dynamic loading, we 
need to remove the references to the TextIO module. For example, we could do: 

val TextIO_openIn : 

(string -> int) option ref = ref NONE 
val TextIO_inputLine : 

(int -> string) option ref = ref NONE 
fun doit 0 = 

let val h = getOpt ( ! TextIO_openIn) 

"myfile.txt" in 
getOpt ( ! TextIO_inputLine) h 
end 

We have converted the externally referenced function into a locally defined ref- 
erence to a function. When the file is dynamically loaded, the reference can get 
filled in. This strategy is essentially a “poor man’s” functorization. This process 
closes the file with respect to values. However, we run into difficulty when we 
have externally defined values of named type. Consider if TextIO wished to hold 
the type instream abstract. If we attempt to close the client code as before, we 
get: 

val TextIO_openIn : 

(string -> TextIO . instream) option ref = ... 
val TextIO_inputLine : 

(TextIO . instream -> string) option ref = ... 

We still have the external references to the type TextIO . instreami itself. We 
must have a way to load a module referring to externally defined, named types. 
Because types form an integral part of typechecking, a trusted operation, our 
solution is to support name-based type equality within the TCB. As we do not 
want to overly complicate the TCB, we base the support for named types on that 
of TAL’s framework for static link verification lEI There, paths are disregarded 
altogether in comparing types; only one module may export a type with a given 
name. A related project, TMAL ESI, approaches this problem differently, as we 
describe in Section EH 

Therefore, loaded code is not closed with respect to externally defined types, 
but instead declares a type interface (Xi,Xe), which is a pair of maps from 
type names to implementations. Xj mentions the named types provided by other 
modules, and Xe mentions named types defined by this one. By not including 
the implementation of the type inside a map X (just mentioning its name). 
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we can use this mechanism to implement abstraction. As an example, the type 
interface of the client code above would be something like: 

{{instream}, {}) 

and the interface for TextID would be the reverse: 

({}, {instream}) 

Part of the implementation of load maintains a list of the imported and ex- 
ported types of all the modules in the program, called the program type interfaee. 
When a new module is loaded, load checks that the named type imports of the 
new module are consistent with the program type interface, and that the ex- 
ports of the new module do not redefine, or define differently, any types in the 
program type interface imports. We do not require that all of a module’s type 
imports be defined by the program interface when it is loaded. This relaxation 
requires a uniform representation of named types; in our case, all named types 
are pointer-types. Not requiring defined imports facilitates loading a file that 
has mutually-recursive type definitions. In particular, the loaded file indicates 
the type it expects from another file to be loaded. When the other file is loaded, 
its export is confirmed to match the previously loaded import. 

We have developed a formal calculus for our framework and have proven 
it sound. While this formalization is interesting, our real contribution lies in 
the way we can program type-safe dynamic linking within our framework. We 
refer the interested reader to the companion technical report j^Il for the full 
theoretical treatment. 

2.2 Implementation 

We have implemented TAL/Load in the TALx86 m implementation of TAL. 
The key component of TAL/Load is the load primitive: 

load : Va. {R{a) x bytearray) — ^ a option 

In addition to the bytearray containing the module data, load takes a term 
representation of its type argument, following the approach of Crary et al. ’s 
Afi 0. Informally, Xr defines term representations for types, called i?-terms, and 
types to classify these terms, called i?-types. For example, the term to represent 
the type int would be and the type of this term would be R{±nt). The type 
R{t) is a singleton type; for each r there is only one value that inhabits it — the 
representation of r. Therefore the typechecker guarantees the correspondence 
between a type variable checked statically and the representation of that type 
used at runtime. 

The actions of load are illustrated in Figure □ In the figure, the square boxes 
indicate unconditional actions, and the diamond boxes indicate actions that 
may succeed or fail. Each square and diamond box has data inputs and outputs, 
indicated as wavy boxes; the arrows illustrate both data- and control-flow. Using 
components of the TALx86 system, load performs two functions: 
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Fig. 1. The implementation of load 



1. Disassembly. The first argument Rt indicates the expected type t of the 
exports, and must be disassembled into the internal representation of TAL 
types. Type t should always be of tuple type, where each element type rep- 
resents the type of one of the object file’s exported values. The second ar- 
gument to load is a byte array representing the object file and the typing 
annotations on it; while conceptually a single argument, in practice TALx86 
separates the annotations from the object code, resulting in an object file 
and a types file. The contents of these two files, stored in buffers, are disas- 
sembled and combined to produce the appropriate internal representation: 
a TAL implementation. 

2. Verification. The TAL implementation is then typechecked in the context 
of the program’s current type interface 0, following the procedure described 
in the previous subsection. If typechecking succeeds, the result is a list of 
exported values and exported types. The values are gathered into a tuple, 
the type of which is compared to the expected type. If the types match, the 
tuple is returned (within an option type) to the caller, and the exported 
types are combined with 0 to form the new program type interface. On 
failure, null {i.e., NONE) is returned. 

The majority of the functionality described above results in no addition to the 
TAL trusted computing base. In particular, the TAL link verifier, typechecker, 
and disassembler are already an integral part of the the TCB; TAL/Load only 
makes these facilities available to programs through load. Three pieces of trusted 
functionality are needed, however, beyond that already provided by TAL: loading 
the object code into the address space of the running program, representing types 
as runtime values, and maintaining the program type interface 0 at runtime. We 
explain how these elements impact the TCB below. 












Safe and Flexible Dynamic Linking of Native Code 



155 



Loading. Following the verification process, before returning to the caller, some 
C code is invoked to load the object code into the address space. This loading 
code is based on that used by the Linux kernel to dynamically load modules. We 
describe the code for ELF object files, used in TALx86 Linux implementation; 
COFF files, used in the Windows implementation, are similar. 

First, the file is parsed, performing well-formedness checks and extracting 
the ELF file’s section headers, which describe the file’s format. The file must be 
a relocatable object file, as is normally produced by a compiler for separate com- 
pilation, e.g. by cc -c. The sections of interest are the code and data sections, 
the relocations section, and the symbol tables. Second, the code and data are 
logically arranged in the order and alignment specified by the file and the ELF 
standard, and the total required size is computed. Third, any externally-defined 
symbols are resolved — more on this below. Finally, an appropriately-sized buffer 
is allocated and the code and data are copied to that buffer (TAL uses garbage 
collection, so the buffer is allocated using the GC allocator) 0 This code is then 
relocated to work relative to the allocated buffer’s address. Finally, the address 
of the buffer is returned to the caller (which is the result of load). 

It is troublesome that we resolve {i.e. link) external symbols during the load- 
ing process. Much of the motivation of our approach is to perform linking outside 
the TCB, in part to avoid the additional complexity. In fact, the overwhelming 
majority of symbols are linked by mechanisms outside the TCB, as we show in 
the next section. However, there are some trusted symbols that cannot easily be 
linked in this way. These symbols are part of the macro instructions of TALx86. 
Macro instructions do not map directly to a machine instruction, but instead to 
a machine instruction sequence; this sequence may include references to external 
symbols. For example, the macro for the TALx86 malloc instruction consists of 
six machine instructions, of which two are calls to external functions, one to 
GCjnalloc (to actually allocate the memory), and the other to out_of jnemory 
(in case the GC allocator returns null). The file cannot be closed with respect 
to these calls, because they are primitive. 

As a result, when a file containing a malloc instruction is dynamically loaded, 
the external calls to must be resolved by the loader. We do this by rewriting 
the code directly, using the relocations provided in the object file. Patching 
symbols in this manner has the unfortunate consequence that loaded code can- 
not be shared between (OS-level) processes because the patched symbols, like 
GCjnalloc, may be at different addresses in each process. 

Given that we must link some symbols implicitly — that the module does not 
truly have to be ‘closed’ — it is reasonable to ask “why not link all symbols in this 
way?” The answer is that it would greatly reduce our flexibility and our security. 

^ Note that this allocation is necessary; we cannot reuse the buffer containing the 
object file data to avoid the copy. The reason is that load effectively changes the 
type of the buffer argument from bytearray to some type a. Placing the object 
file contents in a fresh buffer prevents surreptitiously modifying the given buffer via 
an alias still having bytearray type. We could avoid this copy by proving that no 
aliases exist, e.g. by using alias types m- 
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As motivated in by moving symbol management outside of the TCB, we can 
better control how symbols are stored (he. what datastructure), how they are 
apportioned among users of various privilege levels, how they are interfaced, etc., 
without changing to trusted computing base; instead we can rely on the system 
to verify that this ‘untrusted’ code is safe. 

While implicit linking seems to be necessary for TALx86 macro instructions, 
it may be that our approach could be improved. In particular, if the symbols 
referred to by macro sequences {e.g. GCunalloc) were always loaded at the same 
address, then we could share the code between processes. Given that most mod- 
ern operating systems support separate, per-process address spaces, and that 
both ELF and GOFF files allow the loaded address for a program component 
to be specified, this should be possible. It would furthermore allow the reloca- 
tion process to take place outside of the TGB, preceding the call to load. The 
disassembler would then check for the particular, fixed address when checking 
the well-formedness of macro instruction sequences, rather than looking for an 
external symbol reference. 



Passing Types at Runtime. Term representations for types are used, among 
other things, to preserve TAL’s type-erasure semantics. So that this addition to 
the TAL trusted computing base can be kept small, we do two things. First, 
we represent i?-terms using the binary format for types already used by the 
TAL disassembler. Note that the binary representation of a named type is a 
string containing the name. Second, we do not provide any way within TAL to 
dynamically introduce or deconstruct i?-terms, such as via appropriate syntax 
and typecase [^. Doing so would require that we reflect the entire binary format 
of types into the type system of TAL. Instead, we only allow the introduction 
of i?-terms in the static data segment by a built-in directive. Gonsequently, only 
closed types may be represented. 

Aside from providing type information to load, R-types are also useful for 
implementing dynamic types. Dynamic types may be used to implement type- 
safe symbol management, as we describe in the next section. Therefore we allow 
limited examination of R-terms with a simple primitive called checked _cast: 

checked_cast : Va. V/3. (R{a) x R{/3) x j3) ^ a option 

Informally, checked _cast takes a value of type (3 and casts it to one of type a if 
the types a and /3 are equal. This operation is trivial to add as comparing types 
is part of the TAL typechecker. Therefore it does not add to the TGB. With a 
full implementation of Aji including typecase, checked_cast does not need to be 
primitive 021 . 



Maintaining the Program Type Interface. As explained in the previous 
subsection, the need to maintain the program’s type interface at runtime derives 
directly from the presence of named types in TAL. We may use elements already 
within the TGB to implement the program type interface. Representations of 
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type interfaces {Xj,Xe) already exist as a part of object files; they are used in 
verifying static link consistency. The initial O is initialized in a small bit of code 
generated by the TAL static linker after it has determined the program’s type 
interface. Computing the new type interface at run time is done using this same 
trusted code for static link verification, so maintaining this information at run 
time does not significantly expand the TCB. 

3 Programming Dynamic Linking 

Having defined our dynamic linking framework TAL/Load, we now describe 
how to use TAL/Load to program dynamic linking services as typically defined 
in source languages like C and Java. As a concrete demonstration, we present 
a type-safe version of DLopen 0, a standard dynamic- linking methodology for 
C, that we have written using TAL/Load. Our version, called DLpop, provides 
the same functionality for Popcorn P2I, a type-safe dialect of C. We chose to 
implement DLopen over several other dynamic linking approaches because it is 
the most general; we describe informal encodings of other approaches, including 
Java classloaders in Section 0 We begin by describing DLpop and the 
ways in which it differs from DLopen, and then follow with a description of our 
implementation written in TAL/Load. 



3.1 DLpop: A Type-Safe DLopen 

Most Unix systems provide some compiler support and a library of utilities 
(interfaced in the C header file dlfcn.h) for dynamically linking object files. We 
call this methodology DLopen, after the principal function it provides. We have 
implemented a version of DLopen for our type-safe C-like language. Popcorn 1221 , 
which we call DLpop. The library interface is essentially identical to DLopen 
except that it is type-safe; it is depicted in Figure El We describe this interface in 
detail below, noting differences with DLopen; a thorough description of DLopen 
may be found in Unix documentation |2j . DLpop and DLopen both provide three 
core functions: 



extern handle; 

extern handle dlopenCstring fname) ; 

extern a dlsym<a> (handle h, string sym, <a>rep typ) ; 
extern void dlclose (handle h) ; 

extern exception WrongType (string) ; 
extern exception FailsTypeCheck; 
extern exception SymbolNotFound(string) ; 



Fig. 2. DLpop library interface 
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handle dlopen (string fname) 

Given the name of an object file, dlopen dynamically loads the file and re- 
turns a handle to it for future operations. Imports in the file {i.e., symbols 
declared extern therein) are resolved with the exports {i.e., symbols not 
declared static) of the running program and any previously loaded object 
files. Before it returns, dlopen will call the function _init if that function is 
defined in the loaded file. In DLpop (but not DLopen), dlopen typechecks 
the object file, throwing the exception FailsTypeCheck on failure. In addi- 
tion, the exception SymbolNotFound will be raised if the loaded file imports 
a symbol not present in the running program, or WrongType if a symbol in 
the running program does not match the type expected by the import in the 
loaded file. DLopen functions, in general report errors with an errno-like 
facility. 

a dlsym<a> (handle h, string sym, <a>rep typ) 

In DLpop, dlsym takes a handle for a loaded object file h, a string naming 
the symbol s, and the representation of the symbol’s type typ, dlsym returns 
a pointer to the symbol’s value. The syntax <a> refers to the type argument 
a (not its representation) to dlsym. In lambda-calculus notation, dlsym 
therefore has the type 

dlsym : Va. hauidle x string x R{a) — >■ a 

In DLopen, dlsym does not receive a type argument, and the function re- 
turns an untyped pointer {null on failure), of C-type void *, which requires 
the programmer to perform an unchecked cast to the expected type. The 
fact that our version takes a type representation argument typ to indicate 
the expected type means that this type can be (and is) checked against 
the actual type at runtime. In practice, this type always has the form of a 
pointer type since the value returned is a reference to the requested symbol. 
As in TAL, we have extended Popcorn with representation types (<a>rep), 
implementing them with TAL i?- types. The term representing type t in 
Popcorn is denoted repterm@<t>. Because we cannot create the represen- 
tation of a type with free type variables in TAL, the type argument a to 
dlsym must also be a closed type. If the requested symbol is not present in 
the object file, the exception SymbolNotFound is thrown; if the passed type 
does not match the type of the symbol, the exception WrongType is thrown. 

void dlclose (handle h) 

In DLopen, dlclose unloads the file associated with the given handle. In 
particular, the file’s symbols are no longer used in linking, and the memory 
for the file is freed; the programmer must make sure there are no dangling 
pointers to symbols in the file. In DLpop, dlclose only removes symbols 
from future linkages; if the user program does not reference the object file, 
then it can be garbage collected. 

The current version of DLpop does not implement all of the features of DLopen, 

most notably: DLopen automatically loads object files upon which a dynami- 
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Dynamically linked code: loadable. pop 



extern int foo(int); 

int bar (int i) { 
return foo(i); 

} 



Static code: main. pop 



int foo(int i) { 
return i+1 ; 

} 

void pop_main(){ 

handle h = dlopenf "loadable" ) ; 

int bar(int) = dlsym(h, "bar" , repterm@<int (int) >) ; 
bar (3) ; 
dlclose (h) ; 

} 



Fig. 3. DLpop dynamic loading example 



cally loaded file depends, allowing for recursive references; DLopen supports the 
ability to optionally resolve function references on-demand, rather than all at 
load-time, assuming the underlying mechanisms {e.g. an ELF procedure link- 
age table m) are present in the object file; and DLopen provides a sort of 
finalization by calling the user-defined function _fini when unloading object 
files. We foresee no technical difficulties in adding these features should the need 
arise. In a later version of DLpop, we implemented a variant of dlopen that 
allows the caller to specify a list of object files to load, and these files may have 
mutually-recursive (value) references. On-demand function symbol resolution is 
also feasible; a possible compilation strategy to support it is described below, and 
another approach is described in Section lb. II Finally, finalization is implemented 
in most garbage collectors, in particular the Boehm-Demers-Weiser collector ^ 
used in the current TAL implementation. 

Figure 0 depicts a simple use of DLpop. The user statically links the file 
main . pop, which, during execution, dynamically loads the object file loadable . o 
(the result of compiling loadable .pop), looks up the function bar, and then 
executes it; the type argument to dlsym is inferred by the Popcorn compiler. 
The dynamically linked file also makes an external reference to the function 
foo, which is resolved at load time from the exports of main. pop. 
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struct got_t { 
int (int) foo; 

} 

struct got_t GOT = { dummy } ; 

static int dummyCint i) { 
raise (Failure) ; 

} 



the type of the global offset table 



the global offset table itself 



to avoid null checks, all 
fields have dummy values 



static int bar (int i) { function recompiled to 

return GOT foo(i) • reference the global offset table 

} 



initialization function called by dlopen 



void dyninitCa lookup<a> (string, <a>rep) , 

void update<a>(string, a, <a>rep) ) { 



int (int) foo = lookup ( "foo" ,repterm@<int (int)>); resolve file’s 
GOT. foo = foo; imports 



update ( "bar ", bar ,repterm@<int (int)>) ; 

} 



add the exported function 
to the symbol table 



Fig. 4. Compilation of dynamically loadable code 



3.2 Implementing DLpop in TAL/Load 

Our implementation of DLpop is similar to implementations of DLopen that 
follow the ELF standard m for dynamic linking, which requires both library 
and compiler support. In ELF, dynamically loadable files are compiled so that 
all references to data are indirected through a global offset table (GOT) present 
in the object file. Each slot in the table is labeled with the name of the symbol 
to be resolved. When the file is loaded dynamically, the dynamic linker fills each 
slot with the address of the actual exported function or value in the running 
program; these exported symbols are collected in a dynamic symbol table, used 
by the dynamic linker. This table consists of a list of hashtables, one per object 
file, each constructed at compile-time and stored as a special section in the object 
file. As files are loaded and unloaded, the hashtables are linked and unlinked from 
the list, respectively. 

We describe our DLpop implementation below, pointing out differences with 
the ELF approach. We first describe the changes we made to the Popcorn com- 
piler, and then describe how we implemented the DLpop library. 



Compilation. As in the ELF approach, dynamically loadable files must be 
specially compiled, an operation that we perform in three stages. First, the 
compiler must define a GOT for the file, and translate references to externally 
defined functions and data to refer to slots in the GOT. In ELF, the GOT is 
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int fooCint i) { 
return i+1 ; 

} 

void popjnainO { 

handle h = dlopenC'loadable") ; 
int bar(int) = dlsymCh, "bar" ,repterm@<int (int)>); 
bar(3) ; 
dlclose (h) ; 

^ initialization function called at startup 

void dyninitCa lookup<a> (string, <a>rep) , 

void update<a>(string, a, <a>rep) ) { 

updateC'foo" ,foo,repterm@<int (int)>) ; 

} 



add the exported function 
to the symbol table 



foo is still exported (not static) so 
statically linked files may refer to it 



Fig. 5. Compilation of statically linked code 



a trusted part of the object file, while in DLpop the GOT is implemented in 
the verifiable language, TAL. As a consequence, the table is well-typed with 
the compiler initializing each slot to a dummy value of the correct type, where 
possible. For slots of abstract type, we cannot create this dummy value, so we 
initialize the slot to null and insert null checks for each table access in order to 
satisfy the typechecker. 

Second, the compiler adds a special dyninit function that will be called at 
load-time to fill in the slots in the GOT with the proper symbols. This approach 
differs from ELF, in which the GOT is filled by a dynamic linker contained in the 
running program. From the loading program’s point of view, the dyninit func- 
tion abstracts the linking process. The dyninit function takes as arguments 
two other functions, lookup and update, that provide access to the dynamic 
symbol table. For each symbol address to be stored in the GOT, dyninit will 
look up that address by name and type using the lookup function, and fill in 
the appropriate GOT slot with the result. Similarly, dyninit will call update 
with the name, type, and address of each symbol that it wishes to export. Be- 
cause the dyninit function consists only of TAL code, all linking operations 
are verifiably type-safe. This verification prevents, for example, lookup from re- 
questing a symbol by name, then receiving a symbol of an unexpected type. In 
an untypechecked setting, as in DLopen, this operation could result in a crash. 

Finally, because the exports of dynamically linked files are designated by 
dyninit, the object file should only export dyninit itself; therefore the compiler 
makes all global symbols static. Figure El shows the entire translation for the 
dynamic code in Figure 0 

Statically linked files are only changed by adding a dyninit to export symbols 
to dynamically linked files. At startup, the program calls the dyninit functions 



162 



M. Hicks, S. Weirich, and K. Crary 



struct got_t { 
int (int) foo; 

} 

struct got_t GOT = { dummy } ; 



static int dummyCint i) { 

int (int) foo = dynlookupC'foo" ,repterm@<int 



(int)>) 



look up foo 



GOT. foo = foo; 
return GOT.foo(i); 

} 



replace dummy in the GOT 



call it 



static int bar (int i) { 
return GOT.foo(i); 

} 

static a dynlookup<a> (string, 



saved lookup function as passed to dyninit 

<a>rep) = . . . ; 



void dyninit(a lookup<a> (string, <a>rep) , 

void update<a>(string, a, <a>rep) ) { 



dynlookup = lookup; 

update ( "bar ", bar ,repterm@<int (int)>) ; 

} 



note the lookup function 



Fig. 6. Compilation of dynamically loadable code to resolve functions on-demand. Only 
the parts that differ from FigureEJare commented. 



of each of its statically linked files. Figure 0 shows the static code of Figure 0 
compiled in this manner. 

Rather than add the dyninit function to fill in the GOT’s of loaded files and 
note their exported symbols, we could have easily followed the ELF approach of 
writing a monolithic dynamic linker, called at startup and from dlopen. However, 
we have found that abstracting the process of linking to calling a function in the 
loaded file has a number of benefits. First, it allows the means by which an object 
file resolves its imported symbols to change without affecting the DLpop library. 
For example, in order to save space, we could allow GOT entries to be null 
by changing them to option type, or we could eliminate the GOT altogether 
by using runtime code generation, as described in Section 0 If we knew that 
many symbols may not be used by the loading program (as is likely with a 
large shared library), we could resolve them on-demand by making the dummy 
functions perform the symbol resolution, rather than doing so in the dyninit 
function; this approach is shown in Figure El 

Second, dyninit simplifies the implementation of policy decisions made by 
the loading code with regard to symbol management. For example, the loading 
code may wish to restrict access to some of its symbols based on security crite- 
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ria m-, in this case, it could customize the lookup function provided to dyninit 
to throw an exception if a restricted symbol is requested. 

Finally, using dyninit allows the loaded file to customize operations per- 
formed at link-time. For example, by adding a flag to prevent calls to update 
from occurring on subsequent calls to dyninit (and thus only the lookup calls 
are performed), we can enable code relinking. This allows us to dynamically up- 
date the module in a running program: we load a new version of a module, link 
it as usual, and then relink the other modules in the program to use the new 
module by calling their dyninit functions. Any needed state translation can be 
performed by the new module’s _init function. Though not described here, we 
have fully explored this idea with an alternative version of DLpop P!, HSI, and 
used it to build a dynamically updateable Webserver, FlashEd P|. 

The DLpop Library. The DLpop interface in Figure 0 is implemented as a 
Popcorn library. The central element of the library is a type-safe implementation 
of the dynamic symbol table for managing the symbols exported by the running 
program. We first describe this symbol table, and then describe how the DLpop 
functions are used in conjunction with it. 

DLpop encodes the dynamic symbol table as in ELF, as a list of hashtables 
mapping symbol names to their addresses, one hashtable per linked object file. 
Each time a new object file is loaded, a new hashtable is added. The dynamic 
symbol table is constructed at start-up time by calling the dyninit functions 
for all of the statically linked object files. 

Each entry of the hashtable contains the name, value, and type representation 
of a symbol in the running program, with the name as the key. So that entries 
have uniform type, we use existential types m to hide the actual type of the 
valued 

objfileJit : <string, 3a. {a x R(a))> hashtable 

To update the table with a new symbol (the result of calling update from 
dyninit), we pack the value (say of type /3) and type representation (of type 
R{(3)) together in an existential package, hiding the value’s type, and insert that 
package into the table under the symbol’s key. When looking up a symbol ex- 
pected to have type a, and given a term representation r of type R{a), we do 
the following. First, the symbol’s name is used to index the symbol hashtable, 
returning a package having type 3/3. /3 x i?(/3). During unpacking, the tuple is 
destructed, binding a type variable /3, and two term variables, table_value and 
table_rep, of type /3 and i?(/3), respectively. We then call 

checked _cast [a] [/3] (r, table_rep, table_value) 

which compares r and table_rep, and coerces table_value from type /3 to 
type a if they match. This value is then returned to the caller. Otherwise, the 
exception WrongType is raised. 

The DLpop library essentially consists of wrapper functions for load and the 
dynamic symbol table manipulation routines: 

The type <ti,T 2 > hashtable contains mappings from ri to T 2 . 
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dlopen 

Recall that dlopen takes as its argument the name of an object file to 
load. First it opens and reads this object file into a bytearray. Because 
of the compilation strategy we have chosen, all loadable files should export 
a single symbol, the dyninit function. Therefore, we call load with the 
dyninit function’s type and the bytearray, and should receive back the 
dyninit function itself as a result. If load returns NONE, indicating an error, 
dlopen raises the exception FailsTypeCheck. Otherwise, a new hashtable 
is created, and a custom update function is crafted that adds symbols to it. 
The returned dyninit function is called with this custom update function, 
as well as with a lookup function that works on the entire dynamic symbol 
table. After dyninit completes, the new hashtable is added to the dynamic 
symbol table, and then returned to the caller with abstract type haindle. 

dlsym 

This function receives a type argument (call it a) and three term arguments: 
a handle, h; a string representing the symbol name, s; and the representa- 
tion of the type a, r. Because the handle object returned by dlopen is in 
actuality the hashtable for the object file, dlsym simply attempts to look 
up the given symbol in that hashtable, following the procedure outlined 
above, raising the exception SymbolNotFound if the symbol is not present, 
or WrongType if the types do not match. 

dlclose 

The dlclose operation simply removes the hashtable associated with the 
hcuidle from the dynamic symbol table. Future attempts to look up symbols 
using this handle will be unsuccessful. Once the rest of the program no longer 
references the handle’s object file, it will be safely garbage-collected. 

As a closing remark, we emphasize the value of implementing DLpop. We have 
not intended DLpop to be a significant contribution in itself; rather, the contribu- 
tion lies in the way in which DLpop is implemented. By using TAL/Load, much 
of DLpop was implemented within the verifiable language, and was therefore 
provably safe. Only load and Xr constitute trusted elements in its implementa- 
tion, and these elements are themselves small. If some flaw exists in DLpop, the 
result will be object files that fail to verify, not a security hole. 

We should point out that the implementation described here (and measured 
in the next section) is the first of two DLpop implementations. Our most re- 
cent implementation, described fully in PHI, differs in two key ways from the 
one described here. First, rather than perform the dynamic transformation for 
files within the compiler, we do it source-to-source, preceding compilation. De- 
coupling the transformation from the compiler results in a more modular and 
flexible implementation, but required the addition of some features to Popcorn. 
Second, the newer implementation is more full-featured. It supports loading mod- 
ules with mutually-recursive references, and allows for dynamically updating a 
module, as described above. The principles behind the two implementations are 
essentially the same. 
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4 Measurements 

Much of the motivation behind TAL and PCC is to provide safe execution of 
untrusted code without paying the price of byte-code interpretation (as in the 
JVM) or sandboxing (as in the Exokernel jUJ). Therefore, while the chief goal 
of our work is to provide flexible and safe dynamic linking for verifiable native 
code, another goal is to do so efficiently. 

In this section we examine the time and space costs imposed by load and 
DLpop. We compare these overheads with those of DLopen (using the ELF im- 
plementation) and show that our overheads are competitive. In particular, our 
run-time overhead is exactly the same, and our space overhead is comparable. 
The verification operation constitutes an additional load-time cost, but we be- 
lieve that the cost is commensurate with the benefit of safety, and does not 
significantly reduce the applicability of dynamic linking in most programs. All 
measurements presented in this section were taken on a 400 MHz Pentium II 
with 128 MB of RAM, running Linux kernel version 2.2.5. DLopen/ELF mea- 
surements were generated using gcc version egcs-2.91.66. 



4.1 Time Overhead 

The execution time overhead imposed by dynamic linking, relative to Popcorn 
programs that use static linking only, occurs on three time scales: run-time, 
load-time, and start-time. At run-time, each reference to an externally defined 
symbol must be indirected through the GOT. At load-time, the running program 
must verify and copy the loaded code with load, and then link it by executing its 
dyninit function. At startup, statically linked code must construct the initial 
dynamic symbol table. DLopen/ELF has similar overheads, but lacks verification 
and its associated benefit of safety. 



Run-time Overhead. In most cases, the only run-time overhead of dynamic 
code is the need to access imported symbols through the GOT; this overhead 
is exactly the same as that imposed by the ELF approach. Each access requires 
one additional instruction, which we have measured in practice to cost one extra 
cycle. A null function call in our system costs about 7 cycles, so the dynamic 
overhead of an additional cycle is about 14%. 

For imported values of abstract type, there is also the cost of the null check 
before accessing each GOT element. However, we have yet to see this overhead 
occur in practice. Most files do not export abstract values, but instead “con- 
structor” functions that produce abstract values; an exception in our current 
code base is the Popcorn Core library, which defines stdin, stdout, and stderr 
to have abstract type FILE. These cases typically define the abstract type to 
allow a null value (a sort of abstract option type), meaning that a null-check 
would have occurred anyway. 
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Load-time Overhead. The largest load-time cost in DLpop is verification. Ver- 
ification in load consists of two conceptual steps, disassembly and verification, 
as pictured in Figure ^ and described in Section o Verification itself is per- 
formed in two phases: consistency checking (labeled typecheck in the figure) and 
interface checking (labeled t = typeof{vs)7 in the figure). For the loadable. pop 
file, presented in Figure 0 the total time of these operations is 47 ms, where 2% 
is disassembly, 96% is consistency checking, and the remaining 2% is interface 
checking. Detailed measurements concerning the cost of TAL verification may 
be found in HH!, which notes that in general, verification costs are linear in the 
size of the file being verified. 

The remaining cost is to copy the verified code and to execute the file’s 
dyninit function. For loadable .pop, the total cost of these two operations is 
negligible: about 0.73 ms. This time is roughly twice the time of 0.35 ms for 
DLopen/ELF. The main difference here is simply that the ELF loader is more 
optimized. Because of its small weight relative to verification, there is little reason 
to optimize linking in DLpop. 

Verification is by far the most expensive load-time operation, but its cost 
could be reduced, in three ways. First, the verification code could be more op- 
timized for speed. In particular, proof-carrying code’s Touchstone compiler 133 
has demonstrated small verification times, albeit with a different type system, 
and even TAL’s implementors recognize that further gains could be made [16] . 
Furthermore, disassembly has not been optimized. Second, verification could be 
performed in parallel with normal service. After verification completes, only link- 
ing remains, which has negligible overhead. Finally, in the case of a trusted sys- 
tem, we could turn off the consistency-checking phase during verification, since it 
can be run for each loaded file on some other machine. Leaving on link-checking 
and interface-checking still ensures that the loaded code meshes with the run- 
ning program at the module level, but trusts that the contents of the loaded 
module are well-formed. Since consistency-checking is the most time-consuming 
operation, we greatly reduce our total update times as a result. Breaking up the 
verification operation onto server and client machines has been explored for Java 
in 1^. 

Even with current overheads, verification occurs but once per extension, and 
so should not pose a problem for most applications. Applications that load code 
at larger time scales, and/or for which loaded code is long-lived, will amortize 
the cost of verification over the entire computation. Long running systems that 
load extensions or updates, such as operating systems and network servers, and 
productivity applications that use dynamically loaded libraries fall into this cat- 
egory. Even those applications for which loaded code is short-lived, e.g., agent 
systems, could be accommodated, because while verification time may be large, 
execution time (thanks to native code) will be small, balancing out the total 
cost. 



Start-time Overhead. At start-time, before execution begins, each statically 
linked file’s dyninit function is executed to create the initial dynamic symbol 
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table for the program. In addition, the program type interface, generated by 
the linker, is properly instantiated for use by load. The costs of these operations 
depend on the number of symbols and type definitions exported by each file, and 
which libraries are used. A typical delay is on the order of tens of milliseconds, 
which is meaningless over the life programs that will perform dynamic linking. 

In contrast, ELF imposes no start-time cost, because no type interface is used, 
and because the static linker generates the hashtables that make up the dynamic 
symbol table, storing them in the object file. This implementation trades space 
for time. 



4.2 Space Overhead 

Both DLpop and DLopen/ELF increase the size of object files relative to their 
compilation without dynamic linking support. Based on some simple measure- 
ments, they appear to be fairly comparable in practice. For the most part the 
per-symbol costs for DLpop are higher than that of DLopen/ELF, but there is 
a significantly smaller fixed cost. For the remainder of this section we break the 
down the space costs of DLpop, and compare them to those of DLopen/ELF. 

For both imported and exported symbols, DLpop imposes three space costs: 
the string representation of the symbol name0 its type representation, and the 
instructions in the dyninit function that perform its linking. For imported 
symbols, there is the additional cost of the symbol’s GOT slot and its default 
value. These costs are summarized in Table 0 and compared to the overheads 
DLopen/ELF. DLopen/ELF overheads were determined from and from ex- 
amining object files on our platform. The fixed cost was estimated by subtracting 
the per-symbol costs from the total calculated overhead shown in Figure E] 

The per-symbol cost of DLpop is about one and a half times as much as 
DLopen/ELF when not including type representations t. Type representations 
tend to be large, between 128 and 200 bytes for functions, increasing total over- 
head when they are considered. We mitigate this cost somewhat by sharing 
type representations among elements of the same type. One factor that adds 
to function type representation size is that the representation encodes not only 
the types of the function arguments and returned values, but also the calling 
convention. This fact suggests that sharing type components among represen- 
tations would net a larger savings, since the calling convention is the same for 
all Popcorn functions. We could also reduce per-symbol overhead by eliminating 
dyninit and moving the linking code into the DLpop library. However, dyninit 
is a convenient, flexible way to perform linking, justifying the extra space cost. 

DLopen/ELF has a much higher fixed space cost than DLpop. This comes 
from a number of sources, including load-time and unload-time code sequences, 
and datastructures that aid in linking. In ELF, each of the hashtables of the 
dynamic symbol table is constructed at compile-time and stored in the object 
file. Some of the hashtable overhead is per-symbol, but there is also a large fixed 

Popcorn strings have a length field and an extra pointer (for easier translation 

to/from C-style strings), adding 2 words to a C-style representation. 
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Table 1. Object file overheads, in bytes, for both DLpop and ELF. DLpop overheads 
are broken down into component costs; I is the length of a symbol’s name and t is the 
size of its TAL type representation. 
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cost for the empty buckets in order to improve the hash function accuracy. In 
DLpop, these tables are constructed at start-time, creating a start-up penalty 
but avoiding the extra space cost per object file. 

Figure Q compares DLpop to DLopen/ELF for some benchmark files. Each of 
the four clusters of bars in the graph represents a different source file, with vary- 
ing numbers of imported and exported functions, notated xi ye at the bottom 
of the cluster, where x and y are the number of imports and exports, respec- 
tively. When there is one exported function, its code consists of calling all of 
the imported functions; when there are fifteen functions, each one calls a single 
imported function. All functions are void (void) functions^ Each bar in the 
cluster represents a different compilation approach. The leftmost is the stan- 
dard DLpop approach, and the rightmost is DLopen/ELF. The center bar is 
DLpop without the sharing of type representations, to show worst case behavior 
(when sharing, only one type representation for void (void) is needed). Each 
bar shows the size of object files when compiled statically, compiled to export 
symbols to dynamic code, and compiled to be dynamically loadable (thus im- 
porting and exporting symbols). The export-only case is not shown for ELF, as 
this support is added at static link time, rather than compile-time. 

The figure shows that DLpop is competitive with DLopen/ELF. The figure 
also illustrates the benefit of type representation sharing; the overhead for the 
15i 15e when not sharing is almost twice that when sharing is enabled. As the 
number of symbols in the file increases, the ELF approach will begin to out- 
perform DLpop, but not by a wide margin for typical files (exporting tens of 
symbols). In general, we do not feel that space overheads are a problem (nor 
did the designers of ELF dynamic linking, it seems). We could structure our 
object files so that the dyninit function, which is used once, and type repre- 
sentations, which are used infrequently, will not affect the cache, and may be 
easily paged out. Type representations are highly compressible (up to 90% using 
gzip), and therefore need not contribute to excessive network transmission time 
for extensions. 

® This is the Popcorn (C-like) notation for the type unit — >■ unit. 

* For one-word values, this is the cost of the value plus a pointer; structured values 
are larger. 
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Fig. 7. Comparing the space overhead of DLpop, DLpop without type representation 
sharing, and DLopen/ELF for some microbenchmarks. 



5 Programming Other Linking Strategies 
(Related Work) 

Using our framework TAL/Load, we can implement safe, flexible, and efficient 
dynamic linking for native code, which we have illustrated by programming a safe 
DLopen library for Popcorn. Many other dynamic linking approaches have been 
proposed, for both high and low level languages. In this section we do two things. 
First, we describe the dynamic linking interfaces of some high level languages, 
describe their typical implementations, and finally explain how to program them 
in TAL/Load, resulting in better security due to type safety and/or reduced TCB 
size. Second, we look at some low-level mechanisms used to implement dynamic 
linking, and explain how we can program them in our framework. Overall, we 
demonstrate that TAL/Load is flexible enough to encode typical dynamic linking 
interfaces and mechanisms, but with a higher level of safety and security. 

5.1 Java 

In Java, user-defined classloaders may be invoked to retrieve and instantiate 
the bytes for a class, ultimately returning a Class object to the caller. A class- 
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loader may use any means to locate the bytes of a class, but then relies on the 
trusted functions Classloader .def ineClass and Classloader . resolveClass 
to instantiate and verify the class, respectively. When invoked directly, a class- 
loader is analogous to dlopen. Returned classes may be accessed directly, as 
with dlsym, if they can be cast to some entity that is known statically, such as 
an interface or superclass. In the standard JVM implementation, linking occurs 
incrementally as the program executes: when an unresolved class variable is ac- 
cessed, the classloader is called to obtain and instantiate the referenced class. 
In the standard JVM implementation, all linking operations occur within the 
TCB: checks for unresolved class variables occur as part of JVM execution, and 
symbol management occurs within resolveClass. 

We can implement classloaders in TAL/Load by following our approach for 
DLpop: we compile classes to have a GOT and an dyninit function to resolve 
and register symbols. A classloader may locate the class bytes exactly as in 
Java (j.e., through any means programmable in TAL), and defineClass simply 
becomes a wrapper for a function similar to dlopen, which calls load and then 
invokes the dyninit function of the class with the dynamic symbol table. 

To support incremental linking, we can alter the compilation of Java to TAL 
(hypothetically speaking) in two ways. We first compile the GOT, which holds 
references to externally defined classes, to allow null values (in contrast to DLpop 
where we had default values). Each time a class is referenced through the GOT, 
a null check is performed; if the reference is null then we call the classloader to 
load the class, filling in the result in the GOT. Otherwise, we simply follow the 
pointer that is present. As in the strategy depicted in Figure El the dyninit 
function no longer fills in the GOT at load-time; it simply registers its symbols 
in the dynamic symbol table. This approach moves both symbol management 
and the check for unresolved references into the verifiable language, reducing the 
size of the TGB. 



5.2 Windows DLLs and COM 

Windows allows applications to load Dynamically Linked Libraries (DLLs) into 
running applications, following an interface and implementation quite similar to 
DLopen and ELF, respectively, with some minor differences (see Levine |21 PPS 
217-222]). Like DLopen and ELF, DLLs are not type-safe and would therefore 
benefit in this regard from an implementation in TAL/Load. 

DLLs are often used as vehicle to load and manipulate Gommon Object 
Model [Z| (GOM) objects. GOM objects are treated abstractly by their clients, 
providing access through one or more interfaces, each consisting of one or more 
function pointers. All GOM objects must implement the interface IUnknown, 
which provides the function Queryinterf ace, to be called at runtime to deter- 
mine if the object implements a particular interface. Queryinterf ace is called 
with the globally unique identifier (GUID) that names the desired interface. 
GUIDs are not incorporated into the type-system (at least not for source lan- 
guages like G and G-l— k), and thus, as with dlsym, the user is forced to cast the 
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object’s returned interface to the type expected, with a mistake likely resulting 
in a crash. 

Implementing COM in TAL/Load would be straightforward, with the added 
benefit of proven type-safety for interfaces. Queryinterf ace could be changed 
to take type parameter R{t) in addition to the QUID of the expected interface, 
ensuring the proper type of the returned interface. 



5.3 OCaml Modules 

Objective Caml in (OCaml) provides dynamic linking for its bytecode-based 
runtime system with a special Dynlink module; these facilities have been used to 
implement an OCaml applet system, MMM m Dynlink essentially implements 
dlopen, but not dlsym and dlclose, and would thus be easy to encode in 
TAL/Load. In contrast to the JVM, OCaml does not verify that its extensions 
are well-formed, and instead relies on a trusted compiler. OCaml dynamic linking 
is similar to that of other type-safe, functional languages, e.g. Haskell m- 
A TAL/Load implementation of the OCaml interface would improve on its 
current implementation m\ in two ways. First, all linking operations would occur 
outside of the TCB. Second, extension well-formedness would be verified rather 
than assumed. 



5.4 Units 

Units Eig are software construction components, quite similar to modules. A 
unit may be dynamically linked into a static program with the invoke primitive, 
which takes as arguments the unit itself (perhaps in some binary format) and 
a list of symbols needed to resolve its imports. Linking consists of resolving the 
imports and executing the unit’s initialization function. Invoke is similar to 
dlopen, but the symbols to link are provided explicitly, rather than maintained 
in a global table. 

Units could be implemented following DLpop, but without a dynamic symbol 
table. Rather than compiling the dyninit function to take two functions, lookup 
and update, it would take as arguments the list of symbols needed to fill the 
imports. The function would then fill in the GOT entries with these symbols, and 
then call the user-defined _init function for the unit. The implementation for 
invoke would call load, and then call the dyninit function with the arguments 
supplied to invoke. 

The current Units implementation ng is similar to the one we have described 
above, but is written in Scheme (rather than TAL), a dynamically typed lan- 
guage. Therefore, while linking errors within dyninit may be handled gracefully 
in our system (since they will result in thrown exceptions), in Scheme they will 
result in run-time type errors, halting system service. Alternatively, run-time 
type checks would have to be provided for each access of the GOT. 
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5.5 SPIN 

The extensible operating systems community has explored a number of ap- 
proaches to dynamic linking. For example, the SPIN ^ kernel may load un- 
trusted extensions written in the type-safe language Modula-3. In SPIN, dy- 
namic linking operates on objects called domains in^, which are collections of 
code, data, and exported symbols. Domains are quite similar to Units, with the 
functionality of invoke spread among separate functions for creation, linking, 
and initialization, along with other useful operations, including unlinking and 
combining. All of these operations are provided by the trusted Domain module. 
Furthermore, all operations are subject to security checks based on runtime cri- 
teria. For example, when one domain is linked against the interface of another, 
the interface seen may depend on the caller’s privilege. 

We can implement domains using techniques described above, with the ad- 
dition of filters to take security information into account. TAL/Load would im- 
prove on the security of the current SPIN implementation in the same ways as 
OCaml: less of the domain implementation must be trusted, and integrity of 
extensions can be verified, rather than relegated to a trusted compiler. 

5.6 TMAL 

The TAL module system implemented for TALx86, MTAL (Modular Typed As- 
sembly Language ca), provides a typed version of standard static linking facil- 
ities. Typed Module Assembly Language (TMAL) [I2| is an alternative module 
system for TAL that provides a different model of linking, including dynamic 
linking. Our work in TAL/Load is an extension to TAL to allow dynamically 
linking MTAL modules. Therefore, TMAL and TAL/Load can be seen as two 
ways to solve similar problems. TMAL has not been implemented. 

TMAL adds a simple notion of first-class modules to TAL; by using explicit 
coercions accompanied by runtime checks, the type system remains decidable. 
The operations provided for TMAL module values are much like those for SPIN 
domains, described above. Two modules can be linked together to form a third 
module, and the circumstances of linking can be customized. In particular, co- 
ercions are provided to remove exported names from a module, and to rename 
its types and/or values. In addition, modules can be linked with symbols from 
the program (rather than other modules). 

TMAL also provides primitives for reflection. In particular, TMAL’s dlsym.v 
is essentially the same as DLpop’s dlsym. MTAL, and thus TAL/Load, makes 
the simplification that all named types are global, as we explained in ( 12. 1 1 As 
a module is loaded, its type components are added into the global namespace. 
However, in TMAL, first-class modules can contain type components, which 
introduces a level of hierarchy. As a result, TMAL provides a dlsym_t operation 
for looking up a type component of a module, to be used prior to retrieving a 
value that has that type. 

Finally, TMAL provides primitives for creating and loading dynamically- 
linked libraries, respectively; the latter operation is similar to load, and the 
former is something that we do at compile-time. 
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The major difference between TAL/Load and TMAL is that TAL/Load is 
intended for programming the sorts of operations that TMAL provides as prim- 
itive; the result is a smaller TCB. On the other hand, the goal of TMAL is to 
preserve and statically verify the constraints expressed by the source module 
language at the assembly language level. We could easily implement the major- 
ity of TMAL using TAL/Load, where the notion of handle as implemented in 
DLpop is analogous to a first-class module TMAL. Breaking the linking func- 
tionality out of DLpop’s dlopen into the various TMAL linking primitives would 
be straightforward for values, but tricky for types, though still possible; e.g. our 
technical report |2D| describes a way to implement load to hide global types from 
loaded modules, and we could use existential types to implement something like 
dlsym_t. However, in such an implementation, some properties that could be 
statically verified by TMAL, would have to be dynamically checked by load. 

On the other hand, programming provides flexibility. In the case of values, we 
could even program additional module coercions, since they essentially control 
a module’s symbol table. For example, we could add security information to the 
table to be used during linking, as is done in SPIN. 

5.7 Low-Level Dynamic Linking Mechanisms 

A useful reference of low-level, dynamic linking mechanisms may be found in 
Franz PI- One technique that he presents, which has been used to implement 
some versions of DLopen (as opposed to the ELF methodology PI). is called 
load-time rewriting. Rather than pay the indirection penalty of using a GOT, 
the dynamic linker rewrites each of the call-sites for an external reference with 
the correct address. 

This technique is a simple form of run-time code generation. Popcorn and 
the TAL implementation provide facilities for type-safe run-time code generation, 
called Cyclone ED> that we can use to implement load-time rewriting. Rather 
than compile functions to indirect external references through a GOT, we instead 
create template functions that abstract their external references. When dyninit 
is called, each template function is invoked with the appropriate symbols (found 
by calling lookup), returning a custom version of the original function, closed 
with respect to the provided symbols. This function is then registered with the 
dynamic symbol table using update. The advantage of this approach is that the 
process of rewriting can be proven completely safe. 

There are two notable disadvantages. First, mutually recursive functions are 
problematic because their template functions must be called in a particular or- 
der. One possible solution is to use one level of indirection for recursive calls, 
backpatching the correct values. Another disadvantage is that template func- 
tions make copies of the functions they abstract, rather than filling in the holes 
in place; Gyclone’s approach is more general, but not necessary in our context. 
However, the overall cost of doing this should be low (especially relative to ver- 
ification). We plan to experiment with this approach in future work. 
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6 Conclusions 

We have designed, implemented, and demonstrated TAL/Load, the first com- 
plete type-safe dynamic linking framework for native code. Our approach has 
many advantages: 

— It supports linking of native code so dynamic extensions may be written in 
many source languages. 

— It is composed largely of components already present in the TAL trusted 
computing base, therefore its addition does not overly complicate the code 
verification system. 

— It is expressive enough to support a variety of dynamic linking strategies in 
an efficient manner. 

Furthermore, there is nothing specific to TAL in this strategy — we believe 
that in principle it would also be applicable to Proof Carrying Code (with some 
changes to verification condition generation). We see this work as the first step 
in a larger study of type-safe extensible systems. 
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Abstract. Linear type systems permit programmers to deallocate or ex- 
plicitly recycle memory, but are severely restricted by the fact that they 
admit no aliasing. This paper describes a pseudo-linear type system that 
allows a degree of aliasing and memory reuse as well as the ability to 
define complex recursive data structures. Our type system can encode 
conventional linear data structures such as linear lists and trees as well 
as more sophisticated data structures including cyclic and doubly-linked 
lists and trees. In the latter cases, our type system is expressive enough 
to represent pointer aliasing and yet safely permit destructive operations 
such as object deallocation. We demonstrate the flexibility of our type sy- 
stem by encoding two common space-conscious algorithms: destination- 
passing style and Deutsch-Schorr- Waite or “link-reversal” traversal algo- 
rithms. 



1 Introduction 

Type-safe programming languages, such as Haskell, Java, and ML, do not give 
programmers control over memory management. In particular, these languages 
do not allow programmers to separate allocation and initialization of memory 
objects, nor do they allow explicit re-use of memory objects. Rather, allocation 
and initialization of objects are presented to the programmer as an atomic ope- 
ration, and re-use of memory is achieved “under the covers” through garbage 
collection. In other words, memory management is achieved by meta-linguistic 
mechanisms that are largely outside the control of the programmer. 

In type-unsafe languages such as C or C-I-+, programmers have control over 
memory management so they can tailor routines for application-specific con- 
straints, where the time and/or space overheads of general-purpose memory ma- 
nagement mechanisms do not suffice. However, such languages have a far more 
complicated and error-prone programming model. In particular, neither the sta- 
tic type systems, the compilers, nor the run-time systems of these languages 
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prevent the accidental use of uninitialized objects, or the accidental re-use of 
memory at an incompatible type. Such errors are extremely costly to diagnose 
and correct. 

Our ultimate goal is to provide support for programmer-controlled memory 
management, without sacrificing type-safety, and without incurring significant 
overhead. In addition, we hope to discover general typing mechanisms and prin- 
ciples that allow greater lattitude in the design of low-level languages intended 
for systems applications or as the target of certifying compilers I2SED1. In this 
paper, we take a step further towards these goals by developing a type system 
that gives fine-grained control over memory management, for a rich class of 
recursively defined datatypes. We demonstrate the power of the type system 
by showing how we can safely encode two important classes of optimization, 
destination-passing style and link-reversal traversals of data structures. 

1.1 Background 

One well-known principle for proving type safety is based upon type-invariance of 
memory locations. Simply put, this property says that, when allocated, a memory 
object should (conceptually) be stamped with its type, and that the type of the 
object should not change during evaluation. When this property is maintained, 
it is straightforward to prove a subject-reduction or type-preservation property 
(see for example EMni), which is in turn crucial to establishing type-soundness. 
There are many examples from language design where this principle has been 
violated and resulted in an unsoundness. For instance, the naive treatment of 
polymorphic references in an ML-like language, or the covariant treatment of 
arrays in a Java-like language, both violate this basic principle. 

From the type-invariance principle, it becomes clear why most type-safe lan- 
guages do not support user-level initialization or memory recycling: the type r 
of the memory object cannot change, so (1) it must initially have type t and (2) 
must continue to have type r after an evaluation step. Atomic allocation and 
initialization ensures the first invariant, and the lack of explicit recycling ensures 
the second. Thus, it appears that some meta-linguistic mechanism is necessary 
to achieve memory management when the type-invariance principle is employed. 

Linear type systems H2nni employ a different principle to achieve subject- 
reduction. In a linear setting, the crucial invariant is that memory objects must 
have exactly one reference — that is, no object can be aliased. Unlike the tra- 
ditional approach, the type of a memory object can change over time and thus, 
explicit initialization and recycling can be performed in the language. Unfortun- 
ately, the inability to share objects through aliasing can have a steep cost: Many 
common and efficient data structures that use sharing or involve cycles cannot 
be implemented. 

In recent previous work, we considered a generalization of linear types that 
supported a very limited degree of aliasing m- Like linear type systems, our 
alias types supported separation of allocation and initialization, and explicit 
re-use of memory, but unlike linear approaches, some objects could have more 
than one reference. To achieve subject reduction, we tracked aliasing in the type 
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system by giving memory objects unique names, and maintained the invariant 
that the names were unique. We found that alias types unified a number of 
ad-hoc features in our Typed Assembly Language, including the treatment of 
initialization and control stacks. Furthermore, the alias type constructors were 
easy to add to our type checker for TALx86 m 

Unfortunately, the named objects in our alias-type system were restricted 
to a “second-class” status; though named objects could be passed to and from 
functions, the type system prevented a programmer from placing these objects 
in a recursive datatype such as a list or tree. The problem is that our type 
system did not track aliasing beyond a certain compile-time “frontier”, and in 
this respect, was similar to the k- limiting approaches used in alias analysis CHI. 
As a result, we could not embed linear datatypes into our language, and the 
opportunities for user- level memory management were greatly reduced. 



In this paper, we extend alias types to cover recursive datatypes in full ge- 
nerality. Our type system is powerful enough to encode linear variants of lists 
and trees, as well as richer data structures with complex shapes and aliasing re- 
lationships, such as cyclic or doubly- linked lists and trees. The critical addition 
to the type system is a mechanism for combining recursive type operators with 
first-class store abstractions that represent repeated patterns of aliasing. In this 
respect, our work is inspired by the more complex approaches to alias and shape 
analysis that have recently appeared in the literature 
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The generalization to recursive datatypes opens the door for users or certify- 
ing compilers to have far more control over the memory management of complex 
data structures. To demonstrate this fact, we show how two classes of space 
optimization can be encoded in a language based on recursive alias types. The 
first optimization, called destination-passing style \4imn‘24\ transforms algo- 
rithms that are “tail-recursive modulo allocation” into properly tail-recursive 
algorithms, thereby avoiding the space overheads of a control stack. The se- 
cond optimization shows how we can safely encode Deutsch-Schorr- Waite algo- 
rithms PH] for traversing a tree using minimal additional space, based on link 
reversal. 



In the following section, we motivate the type structure of the language by 
introducing a series of type-theoretic abstraction mechanisms that enable suita- 
ble approximations of the store. We then show how these constructors may be 
used to encode a number of common data structures, without losing the ability 
to explicitly manage memory. Section 0 formalizes these ideas by presenting the 
syntax and static semantics of a programming language that includes instruc- 
tions for allocating, deallocating, and overwriting memory objects. Section 0 
shows how the destination-passing style and link-reversal optimizations can be 
safely encoded in the language. Section 0 presents an operational semantics for 
the language and states a type soundness theorem. We close in Section 0 by 
discussing some of the limitations of this work and how they might be addressed 
as well as giving more detail on related research. 
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2 Types for Describing Store Shapes 

The linear pair t\ 0 T2 captures an extremely valuable memory management in- 
variant: There is only one access path to any value with this type. Consequently, 
if X has type t\ ® T2 then once both its components have been extracted, it 
is safe to reuse x to store new values with incompatible types. Since the only 
way to access x’s data is through x itself, there is no chance that this reuse can 
introduce inconsistent views of the store and unsoundness into the system. 

Unfortunately, the restriction to a single access path makes it impossible 
to construct a number of important data structures. Our goal is to lift this 
restriction and yet retain the capacity to reuse or deallocate memory when there 
is a pointer to it. Our approach is based on the intuition that a linear data 
structure may be decomposed into two parts, a piece of state and a pointer to 
that state. Destructive operations such as memory reuse alter only the state 
component and leave the pointer part unchanged. Consequently, if the goal is to 
ensure no inconsistencies arise, only the state component need be treated linearly. 
The pointer may be freely copied, making it possible to construct complex data 
structures with shared parts. Of course, in order to actually use a pointer, there 
must be some way to relate it to the state it points to. We make this relationship 
explicit in the type system by introducing locations, £, that contain the state 
component, and by specializing the type of a pointer to indicate the location it 
points to. Consider again the linear pair ti O T2. We factor it into two parts: 

— A type for the state, called an aliasing constraint or store description, that 
takes the form {£ >->• (ti, T2)}. This type states that at location (. there exists 
a memory block containing objects with types t\ and T2- 

— A type for a pointer to the location: ptr{£). This type is a singleton type — 
any pointer described by this type is a pointer to the one location i and to 
no other location. 



This simple trick provides a tremendous flexibility advantage over conven- 
tional linear type systems because even though constraints may not alias one 
another, there is no explicit restriction on the way pointer types may be mani- 
pulated. 

We build complicated data structures by joining a number of aliasing con- 
straints together using the constructor. For example, the following DAG may 
be specified by the constraints below. 




{£i {ptr{i3,),ptr{£2))} ^ 

{£2 (ptr(£3))}0 

{£3 {int)} 



The most important property of 0 is that it joins descriptions for separate 
portions of the store. In this respect, it is identical to the “spatial conjunction” 
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studied by Reynolds m and Ishtiaq and O’Hearn m- This separation property 
makes it possible to reason about potential aliasing relationships. For example, 
if a store is described by constraints {£i !->• n } O • • • ® {£„ i-i- r„} then each 
location £i on the left-hand side of one of the constraints must be different from 
all other locations on the left-hand side. This invariant resembles invariants for 
the typing context of a standard linear type system. For example, the linear 
context a;i:Ti, . . . , Xn'-Tn implies that the Xi are distinct values with linear types 
Ti- However, the analogy is not exact because a linear type system prevents any 
of the Xi from being used more than once whereas our calculus allows pointers 
to the locations £i to be used over and over again on the right-hand sides of 
constraints. This flexibility makes it possible to represent aliasing. For instance, 
in the example above, there are two paths from location £\ to location £3, one 
direct, and one indirect through location £2- 

One other important invariant is that the ordering of the constraints joined by 
® is not important: {£i >->• t\}®{£2 >->■ T2} is equivalent to {£2 T2}(Si{£i !->■ n}. 

For the sake of brevity, we often abbreviate {£1 >—>■ ti} 0 • • • 0 {£n >— t t„} with 

{£1 I— >■ Tl, . . . ,£„ I— >■ T„}. 

2.1 Abstraction Mechanisms 

Any particular store can be represented exactly using these techniquetQ, even 
stores containing cyclic data structures. For example, a node containing a pointer 
to itself may be represented with the type {£ >->■ {ptr{£))}. However, the principal 
difficulty in describing aliasing relationships is not specifying one particular store 
but being able to specify a class of stores using a single compact representation. 
We use the following type-theoretic abstraction mechanisms to describe a wide 
class of pointer-rich data structures. 

Location Polymorphism. In general, the particular location £ that contains an 
object is inconsequential to the algorithm being executed. The relevant informa- 
tion is the connection between the location £, the contents of the memory residing 
there, and the pointers ptr{£) to that location. Routines that only operate on 
specific concrete locations are almost useless. If, for example, the dereference 
function could only operate on a single concrete location £, we would have to 
implement a different dereference function for every location we allocate in the 
store! By introducing location polymorphism, it is possible to abstract away from 
the concrete location £ using a variable location p, but retain the necessary de- 
pendencies. We use the meta-variable 77 to refer to locations generically (either 
concrete or variable). 

Store Polymorphism. Any specific routine only operates over a portion of the 
store. In order to use that routine in multiple contexts, we abstract irrelevant 
portions of the store using store polymorphism. A store described by the con- 
straints e 0 {77 1— >■ r} contains some store of unknown size and shape e as well as 
a location 77 containing objects with type r. 



^ We cannot represent a store containing a pointer into the middle of a memory block. 
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Unions. Unlike polymorphic types, unions provide users with the abstraction of 
one of a finite number of choices. A memory block that holds either an integer or 
a pointer may be encoded using the type {int)U{ptr{r])). However, in order to use 
the contents of the block safely, there must be some way to detect which element 
of the union the underlying value actually belongs to. There are several ways to 
perform this test: through a pointer equality test with an object of known type, 
by descriminating between small integers (including null/ 0 ) and pointers, or by 
distinguishing between components using explicit tags. All of these options will 
be useful in an implementation, but here we concentrate on the third option (see 
Sect, m.ll for further discussion). Hence, the alternatives above will be encoded 
using the type {S{l),int)U{S{2),ptr{ri)) where S{i) is another form of singleton 
type — the type containing only the integer i. 

Recursion. As yet, we have defined no mechanism for describing regular re- 
peated structure in the store. We use standard recursive types of the form 
pa.T to capture this notion. However, recursion by itself is not enough. Con- 
sider an attempt to represent a store containing a linked list in the obvious 
way: {77 pa.{S{l)) U (5(2),a)}0 An unfolding of this definition results in 

the type {p >->• (5(1)) U (5(2), (5(1)) U {S (2), List))}, rather than the type 
{77 !->■ (5(1)) U {S (2), ptr{r]' )),?]' H> (5(1)) U (5(2), List)}. The former type de- 
scribes a number of memory blocks flattened into the same location whereas the 
latter type describes a linked collection of disjoint nodes. 

Encapsulation. In order to represent linked recursive structures properly, each 
unfolding must encapsulate its own portion of the store. We use an existential 
type for this purpose. Hence, a sensible representation for linked lists is 

fj.a.{S{l)) U 3[p:Loc | {p a}].{S{2),ptr{p)) 

The existential 3[p:Loc | {p 1 — >■ ti}].T 2 may be read “there exists some location p, 
different from all others in the program, such that p contains an object of type 
Ti, and the value contained in this data structure has type T 2 . More generally, an 
existential has the form 3[Z\ | C].r. It abstracts a sequence of type, location and 
store variables with their kinds. A, and encapsulates a store fragment described 
by C. In our examples, we will omit the kinds from the sequence A as they are 
clear from context. A similar definition gives rise to trees: 

p,a.{S{l)) U 3 [pi,P 2 I {pi a,p 2 a}]. 

{S{2),ptr{pi),ptr{p2)) 

Notice that the existential abstracts a pair of locations and that both locations 
are bound in the store. From this definition, we can infer that the two subtrees 
are disjoint. For the sake of contrast, a DAG in which every node has a pair of 
pointers to a single successor is coded as follows. Here, reuse of the same location 
variable p indicates aliasing. 

Ma.(5(l)) U 3[p I {p ^ a}].{S{2),ptr{p),ptr{p)) 

^ Throughout we use the convention that union binds tighter than the recursion ope- 
rator. 
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Lists where the terminal node points back to the head and trees where the leaves 
that point back to their roots can be encoded as follows. 

Circular List = 

{pi p,a.{S{l),ptr{pi)) U3[p2 | {p 2 a}].{S{2),ptr{p2))} 

CircularTree = 

{pi p,a.{S{l),ptr{pi))\J 

3[P2,P3 I {P 2 a,p 3 a}].{S{ 2 ),ptr{p 2 ),ptr{p:i))} 

Parameterized Recursive Types. One common data structure we are unable to 
encode with the types described so far is the doubly- linked list. Recursive ty- 
pes only “unfold” in one direction, making it easy to represent pointers from 
a parent “down” to its children, or all the way back up to the top-level store, 
but much more difficult to represent pointers that point back up from children 
to their parents, which is the case for doubly-linked lists or trees with pointers 
back to their parent nodes. Our solution to this problem is to use parameterized 
recursive types to pass a parent location down to its children. In general, a pa- 
rameterized recursive type has the form rec a (fdi'.Ki, . . . , (3n'.Kn)-T and has kind 
(/ti, . . . , Kn) — >■ Type. We will continue to use unparameterized recursive types 
pa.T in examples and consider them to be an abbreviation for rec a () .r [a () /a] . 
Once again, kinds will be omitted when they are clear from the context. Trees 
in which each node has a pointer to its parent may be encoded as follows. 

{Proot {S{2),ptr{pL),ptr{pn))}® 

\^PL ' ^ REC (^Prooti O {PR ' ^ REC (^Proott Pi?)} 



where 

REC = 

rec a (pprt j Pciirr ) ■ 

{S{l),ptr{pprt))'J 
^[Pl,Pr I {PL a(p curr 5 

{pR ^ a{p 

curr 7 pk)}]- 

(5(2) , ptr{pL), ptr{pn ) , ptr(pprt)) 

The tree has a root node in location proot that points to a pair of children in 
locations pr and pR, each of which are defined by the recursive type REC. REC 
has two arguments, one for the location of its immediate parent pprt and one for 
the location of the current node Pcurr- Either the current node is a leaf, in which 
case it points back to its immediate parent, or it is an interior node, in which 
case it contains pointers to its two children p^ and p^ as well as a pointer to 
its parent. The children are defined recursively by providing the location of the 
current node {pcurr) for the parent parameter and the location of the respective 
child (pl or pn) for the current pointer. 

Function Types. Functions are polymorphic with type arguments A and they ex- 
press the shape of the store (C) required by the function: V[Z\ | C'].(ti, . . . , r„) — >■ 
0. The underlying term language will be written in continuation-passing style and 
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therefore functions never return, but instead call another function (the function’s 
continuation). We use the notation ?>0” to indicate this fact. Continuation- 
passing style is extremely convenient in this setting because it makes the flow of 
control explicit in the language and the store shape varies from one control-flow 
point to the next. 



2.2 Summary of Type Structure 

Figured defines the formal syntax for the type constructor language. We use j3 
to range over type constructor variables generically. When we want to be more 
precise, we use p, e and a to range over location, store and type variables. A 
type context Z\ is a sequence of bindings of the form /3 i:ki, . . . , /3„:k„ where 
none of the /3i are repeated. The domain of A, denoted Dom{A), is the sequence 
Pi,. fin- The type constructor language itself contains all the types discussed 
in the previous subsection and one other, the junk type. Objects of type junk 
are unuseable and arise during the initialization of data structures. Section 0 
contains further explanation. 

A judgement A\- c : k states that under type context A, the type constructor 
c is well-formed and has kind k. Locations have kind Loc, aliasing constraints 
have kind Store, and types have kind Type. Recursive types have arrow kinds 
that can be eliminated through constructor application c (ci, . . . , c„). The judge- 
ment A \- Cl = C2 '■ K states that type constructors ci and C2 are equivalent and 
well-formed with kind k. Types are considered equivalent up to alpha-conversion 
of bound variables and constraints are considered equivalent up to reordering of 
the elements in the sequence. A recursive type is not considered equal to its 
unfolding. The formal rules for these judgements are straightforward and they 
appear in Appendix El 

We use the notation A[A/x] to denote the capture-avoiding substitution of X 
for a variable x in A. Occasionally, we use the notation X[ci, , CnfA] to denote 
capture-avoiding substitution of constructors ci , . . . , c„ for the corresponding 
type variables in Dom{A). Substitution is defined in the stardard way in all 
cases except for the substitution of constraints in constraints. Substitution of C 



kinds K 

constructor vars fi 
constructor ctxts A 



Loc I Store | Type | (ki, . . . , k„) — ^ Type 
p I e I a 

• I A,P:k 



con’s 
locations 
store types 
types 



c rj \ C \ T 

ri:~ p\e 

C ::= 0 I C (g) {p !->■ r} I C ® £ 

T ::= a I junk \ int \ S{i) \ ptr(p) \ (n, | n U T2 | 

V[A|C].(ri,...,r„)^0|3[A|C].r| 
reca (A).r | c(ci, . ..,c„) 



Fig. 1. Type Structure: Syntax 
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for a constraint variable e in C appends the list C to the list C. We use the 
notation C'@C to denote the result of appending C to C (notice that C ® C 
is not syntactically well- formed) . For example, 

(0 0 ai (g) • • • (g) a^)@(0 0 a'l 0 • • • 0 a'„) = 

0 0 ai 0 • • • 0 Om 0 0 • • • 0 aJi 

Formally, substitution for constraints is defined as follows. 

(C'0e)[C"/e] = (C[C'7e])@C' 

We will continue to omit the initial “0” when a constraint is non-empty. For 
example, we write {rj i— >■ r} instead of 0 0 {77 H> t}. 



3 Term Structure 

The term structure is split into three classes: small values, instructions, and 
coercions. Figure |3 describes the syntax of the language. 



small values 
instructions 



coercions 



V ■.:= x\ i \ S(i) I w[c] | fix/[/i | C](a:i:Ti, . . . ,a:„:T„).t 
1 ■.:= new p, x,i\L\ free v, i \ let x = v.v, i, \ V\.i := V 2 \ t | 
caseu (ini 4 i | inr ^ 42) | u(vi, . . . ,v„) | halt u | 
coerce(7); 4 

7 union.riUT2(77) I rollrecc (z1).t (ci,...,c„) (p) | 

unroll(p) I pack[^^^ ^(p) | unpack A (p) 



Fig. 2. Term Structure: Syntax 



3.1 Small Values 

Small values include integers (i) and singleton integers S(i). These two different 
sorts of integer can be implemented using the same representation. The annot- 
ation iS(-) is present only to guide the type checker. Plain integers are given the 
general type int and singletons are given the specific type S(i). 

Functions are considered small for the purposes of this paper; we will not 
concern ourselves with the problem of collecting function closures here0 Func- 
tions may be recursive and contain a specification of the polymorphic variables 
A, the requirements on the store C and the types of the parameters. These 
preconditions are used to type the instruction sequence that forms the body of 

^ Programmers may explicitly construct their own closures using the existential types 
we provide to hide the type of the closure environment ESEHI- We do not closure 
convert our code here as it would serve only to complicate the discussion. 



186 



D. Walker and G. Morrisett 



A-,r\-V’.r 



A-,r\- x: r{x) A-r\-i-.int A-,r\- S{i) :S{i) 

A \- y[A' I C"].(ri, . . . ,Tn) ^ 0 = Tf : Type AA'\ C'-,r, f-.Tf,xi'.Ti,. . . h t 

A-,r\- fix/[Z\' I C"](a;i:ri, . . . : t/ 

A-, r \- V : V[/3:k, a' I (7'].(ri, . . . , r„) — >• 0 A\~ c k 

Z\; r h «[c] : (V[Z\' I C'].(ri, . . . ,r„) ^ 0)[c//3] 

A', r V : t' a t' = t : Type 
A\F \- V \ T 



Fig. 3. Static Semantics: Values 

the function. The value v[c\ denotes type application of the polymorphic func- 
tion V to type constructor c. We often abbreviate successive type applications 
f [ci] • • • [c„] by u[ci , . . . ,Cn]- Later, when we give an operational semantics for 
the language (Sect. E), we will add other small values, including pointers, but 
these objects are not manipulated by programmers — they only appear during 
run time evaluation of programs — and so we omit them for now. 

The typing judgements for small values have the form A; F \- v : t where F 
is a finite partial map from value variables to small types. The rules are mostly 
standard and are presented in Fig. 01 



3.2 Instructions 

Figure0presents the typing rules for the instructions. The judgement A;C;F\~l 
states that in type context A, a store described by C and value context F, the 
instruction sequence b is well-formed. 

Memory Management Instructions. The principle interest of the language is the 
typing of memory management instructions. Operationally, the new p, x, i in- 
struction allocates a memory block of size z at a fresh location and substitutes 
the location for p and a pointer to that location for x in the remaining instruc- 
tions 0 This operation is modeled in the type system by extending the store 
description with a memory type of length i. Initially, the fields of the memory 
block are filled with uninitialized junk. Once a block has been allocated, it may 
be operated on by accessor functions let x = vi.i and vi.i := V 2 , which project 
from or store into the field of vi. The projection operation is well- formed 
if vi is a pointer to some location rj and that location contains a object with 

For the purposes of alpha-conversion, p and x are considered bound by this instruc- 
tion. 



4 



Alias Types for Recursive Data Structures 



187 



A;C;r h t 



A, p:Loc; C (g) {p !->■ {junk, . . . ,junk)}\ F, x:ptr{p) h t 
A;C;r\- new p, x, i; i 



(x 0 Dom{F), p 0 Dom{A)) 



A-,r\-v: ptr{p) A h C = C' ® {p !->■ (n, . . . , Tn)} : Store A; C'; T h t 

A; C; -T h free v\ b 



A', r \- V : ptr{p) 

A h (7 = C" ® {p !->■ (ti, . . . , Tn)} : store A-,C’,r, X'.Ti h i 
A; C; -T h let ® = v.i\ l 



( 1 < * < \ 
0 Dom{F) J 



A; rh n : ptr(p) A h C = C' (g> {p (n, . . . , n, . . . , T„)} : Store 
A; r h W 2 : r A; C' (g) {p i->- (n, ... ,t,. . . ,T„)}; Th t 

A; C; T h uii := V2; b 



(1 < i < n) 



A\F\-v: ptr{p) A h C = C' (g) {p !->• n U T 2 } : Store 
A h ri = 3[A; \C[].--- 3[A' | C'].(5(l), r{, . . . , r^) : Type 
A h T 2 = 3[A'/ I a']. • • • 3[A" I C" ].(5(2), r(', . . . , r") : Type 
A; C' (g) {p !->■ Ti}; r h ti A; C' (g) {p !->• T 2 }; F b 2 

A; C; T h case u (ini ^ bi \ inr ^ 2 ) 

A; r h V : V[- I c].(ri, . . . ,T„) ->• 0 A; F \~ Vi : ti ■■■ A; F \~ v„ : T„ 

A; C; r h v{vi,. 

A; r h « : inf A; C h 7 =» A'; C' A';C';rhr 
A; C; T h halt u A;C; F h coerce( 7 ); b 



Fig. 4. Static Semantics: Instructions 



type {t\, . . . ,Tn) (where i is less than n). In this case, the remaining instruc- 
tions b must be well-formed given the additional assumption that x has type 
Ti. The update operation is similar in that v\ must be a pointer to a location 
containing a memory block. However, the remaining instructions are verified in 
a context where the type of the memory block has changed: The field has 
type T where r is the type of the object being stored into that location, but is 
otherwise unconstrained. Although surprising at first, this rule is sound because 
the constraints behave linearly. Despite the fact that the type of a memory block 
at a location changes, each location can only appear once in the domain of a 
store type and therefore there is no opportunity to introduce inconsistencies. 
Constraints such as {p :— >■ r} (g) {p :— >■ t'} will never describe a well-formed store. 
The instruction free v deallocates the memory block pointed to by v. This effect 
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A C I- 7 -4'; C' 

A\- C = C' ® {rj Ti} : Store 
A\C \- unionTiUT2(?7) = 



Zl h ri : Type Z\ h T2 : Type 
Z\; C' (g) {?7 !->• ri U T2} 



(for i = 1 or 2) 



Zi h r = (rec a (A').t') (ci, . . . , c„) : Type 
A\- C = C' ® {ri r'[rec a (Zi').r'/a][ci, . . . , c„/Z\']} : Store 

Z\; G h roll,- (77) =7 A; C' (3 (r) 1-^ r} 

Zi h G = G' ® {?? e- >• r} ; Store Z\ h r = (rec a (A').r') (ci, . . . , c„) : Type 
Zi; G h unroll(?7) =7 A; G' ® {77 i-T- r'[rec a (Zi').T'/a] [ci, . . . , c„/Z\']} 

Zi' = / 3 i:ki, . . . ,/ 3 „;k„ • h Ci : Tti (for 1 < i < tt) 

Z\ h G = G" (g) {77 1-^ tlci, . . . ,Cn/Z\']} (g) G'[ci, . . . ,c„/Z\'] : Store 

Z\; G h 3[zi'|C']-'r(7) =7 Z\; G" (g) {77 i->- 3 [Z\' | G'J.t} 

Z\ h G = G" (g) {77 i-> 3 [Z\' I G'].r} : Store 
Z\; G h unpack Z\^ (77) =7 Zi, A! \ C" (g) {77 1— >■ r} @ G^ 



Fig. 5. Static Semantics: Coercions 



is reflected in the typing rule for free by requiring that the remaining instructions 
be well-formed in a context C that does not include the location rj. 

As a warm-up example, consider the process of allocating and initializing a 
pair of pairs, where the deeper pair is aliased. The comments on the right-hand 
side present a portion of the type checking context after each program point. 

newpj,,a:,2; "/. x:ptr{px) 

"L {px ^ {junk, junk)} 

new py,y,2- 7, x:ptr{px),y.ptr{py) 

7, {px !->■ {junk, junk)}® 

°/o {py 1-7 {junk, junk)} 

x.l:=y, 7. x:ptr{px),y.ptr{py) 

7. {px 1-7 {ptr{py),junk)}® 

°/o {py 1-7 {junk, junk)} 

X.2 := 77 ; "/. x:ptr{px),y:ptr{py) 

7. {px 1-7 {ptr{py),ptr{py))}® 

7. {py 1-7 {junk, junk)} 
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At each update operation, the type checker verifies that x has a pointer type 
and modifies the type of x's memory block accordingly. The interesting aspect of 
this example is that after the fourth instruction in the sequence, there are three 
aliases to the second memory block: the variable y and the two components of x. 
We can see this is true, simply by counting the number of occurences of ptr{py) 
in the type checking context. Each occurence must alias the others. All three 
aliases are accurately tracked in the type system and any of them may be used. 
When we are finished with the data structure, we may deallocate it: 

free y; 7. x:ptr{p^),y:ptr{py) 

7 . {px (ptr{py),ptr{py))} 

free x; 7, x:ptr{p^),y:ptr{py) 

7 . 0 



After deallocation, we are left with two dangling pointers, one to the deallocated 
location px and a second to the deallocated location py. Fortunately, the type 
checker prevents these pointers from being derefenced. For example, if the next 
instruction in the sequence was the projection let z = x.l, it will fail since there 
is no constraint C such that % = C' ® {px >->■ (n, . . . , r„)}. 

Control-flow Instructions. The typing of the case expression is somewhat unu- 
sual. Operationally, case checks the first field of the memory block in the location 
pointed to by a value v. If the first field is a 1, execution continues with the first 
instruction sequence, and if it is a 2, execution continues with the second in- 
struction sequence. However the memory type constructor (• • •) will not be the 
top-most type constructor (otherwise, the case would be unnecessary). The type 
system expects a union type to be top-most and each alternative may contain 
some number (possibly zero) of existential quantifiers to abstract the store en- 
capsulated in that alternative. The underlying memory value must have either 
tag 1 or tag 2 in its first field. 

Because the language has been defined in continuation-passing style, all in- 
struction sequences are either terminated by a function call v{vi , . . . , Vn) or a call 
to the terminal continuation halt, which requires an integer argument. Function 
calls are well-formed if the polymorphic function v has been fully instantiated, 
the constraints in the current context equal the constraints required by the fun- 
ction, and the argument types match the types of the function parameters. 

3.3 Coercions 

The last instruction coerce (7) applies a typing coercion to the store. Coercions, 
unlike the other instructions are for type-checking purposes only. Intuitively, 
coercions may be erased before executing a program and the run-time behaviour 
will not be affected. The judgement form Z\; C h 7 => A'; C indicates that a 
coercion is well-formed, extends the type context to A' , and produces new store 
constraints C . These judgements are presented in Fig.0 
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Each coercion operates on a particular object in place at a location 77 . The 
union coercion lifts the object at 77 into a union type and the roll/unroll co- 
ercions witness the isomorphism between a recursive type and its unfolding. The 
coercion pa.ck^^,^...^c„\C'lci,-,c„/A']]^s 3 lA'\C'].ri'n) introduces an existential type 
by hiding the type constructors ci , . . . , c„ and encapsulating the store described 
by C'[ci, . . . ,Cn/A']. The unpack coercion eliminates an existential type, binds 
the variables in the context A' (so these variables may be used in the following 
instructions) and augments the current constraints with the encapsulated C' . 

4 Applications 

In this section, we show how our language can be used to encode two common 
programming patterns, the destination-passing style pattern, which constructs 
data structures efficiently and the Deutsch-Schorr- Waite or “link-reversal” pat- 
terns, which traverse data structures using minimal additional space. 

4.1 Destination-Passing Style 

The destination-passing style (DPS) transformation detects a certain form of 
“almost-tail-recursive” function and automatically transforms it into an efficient 
properly tail-recursive function. The transformation improves many functional 
programs significantly, leading a number researchers to study the problem in 
depth Our contribution is to provide a type system that can be 

used in a type-preserving compiler and is capable of verifying that the code 
resulting from the transformation is safe. 

Append is the canonical example of a function suitable for DPS: 

fun append (xs,ys) = 
case xs of 
[] -> ys 

I hd :: tl -> hd :: append (tl,ys) 

Here, the second-last operation in the second arm of the case is a function call 
and the last operation constructs a cons cell. If the two operations were inverted, 
we would have an efficient tail-recursive function. In DPS, the function allocates 
a cons cell before the recursive call and passes the partially uninitialized value 
to the function, which computes its result and fills in the uninitialized part of 
the data structure. If the input list xs is linear, it will not be used in the future. 
In this case, it is possible to further optimize the program by reusing the input 
list cells for the output list. Our example performs both of these optimizations. 

Before presenting the code for the optimized function, we will need to define 
a number of abbreviations. Such abbreviations not only aid readability, but also 
help compress typing information in a compiler m First, recall the type of 
integer lists List and their unrolling List': 

List = U 3[p | {p 1 — >■ a}].{S(2),int,ptr(p)) 

List' = (5(1)) U 3[p I {p I— >■ List}].{S{2),int,ptr{p)) 
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Given these list definitions, it will be useful to define the following composite 
coercion. 

rollList Pi packing p2 = 

packjp 2 |{p 2 i->.List}]as 3[p2|{p2'->iist}].<S(2),mt,ptr(p2)) 
unioniist/(pi); 

rolliist(Pi) 

This coercion operates on a portion of the store with shape 

{pi {S{2),int,ptr{p2))} ® {p 2 List}. 

It packs up p 2 into an existential around pi, lifts the resultant object up to a 
union type and finally rolls it up, producing a store with the shape {pi i— >■ List}. 

The function append', presented in Fig. 0 implements the inner loop of the 
optimized append function. A wrapper function must check for the case that the 
input list is empty. If not, it passes two pointers to the beginning of the first list 
(aliases of one another) to append' for parameters prev and start. It also passes 
a pointer to the second element in that list for parameter xs and a pointer to 
the second list for parameter ys. Notice that the contents of location ps are not 
described by the aliasing constraints. On the first iteration of the loop Ps is an 
alias of pp and on successive iterations, it abstracted by e. However, these facts 
are not explicit in the type structure and therefore ps cannot be used during 
any iteration of the loop (cont will be aware that pa equals pp and may use the 
resultant list). 

The first place to look to understand this code is at the aliasing constraints, 
which act as a loop invariant. Reading the constraints in the type from left to 
right reveals that the function expects a store with some unknown part (e) as 
well as a known part. The known part contains a cons cell at location pp that is 
linked to a List in location p^a. Independent of either of these objects is a third 
location, pya, which also contains a List. 

The first instruction in the function unrolls the recursive type of the object 
at pxa to reveal that it is a union and can be eliminated by a case statement. In 
the first branch of the case, xs must point to null. The code frees the null cell, 
resulting in a store at program point 1 that can be described by the constraints 
e® {pp !->■ (S{2),int,ptr{pxa))} ® {Pys i-T List}. Observe that the cons cell at 
Pp contains a dangling pointer to memory location pxa, the location that has 
just been freed and no longer appears in the constraints. Despite the dangling 
pointer, the code is perfectly safe: The typing rules prevent the pointer from 
being used. 

Next, the second list ys is banged into the cons cell at pp. Hence, at program 
point 2, the store has a shape described by e O {pp >->■ {S{2),int,ptr{pya))} O 
{pya I— >■ List}. The type of the cons cell at Pp is different here than at 1, reflecting 
the new link structure of store. The tail of the cell no longer points to location 
PxBi but to Pys instead. After packing and rolling using the composite coercion, 
the store can be described by e ® {pp List}. This shape equals the shape 
expected by the continuation (see the definition of Tc), so the function call is 
valid. 
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fixappend' [e, p^s, Pya, Pp, Ps \ 

e(g) {pp !->• {S{2),int,ptr{pxa)), pxs ^ List, pya i->- List}]. 

(xs :ptr{pxa),ys : ptr{pya),prev : ptr{pp), start : ptr{pa), 
COnt : Tc[t, pp, pa]) ■ 
unroll (pjjs); 
case xs 
( ini 

free xs\ 
prev.3 ys-, 
rollList pp packing pya ; 
cont(start) 

I inr 

unpack pt( (pxa)', 
let tl = XS.3; 



•/. 1 . 
•/. 2 . 
•/. 3 . 



•/. 4 . 
•/. 5 . 



append' 

[e® {pp e-)- {S[2),int,ptr{pxa))}, pa, pya, pxa, pa] 
{tl, ys,xs, start, cont')) 



where Tc]e, pp, pa] = V[- | e (g) {pp !->• List}].{ptr{pa)) — ^ 0 



Fig. 6. Optimized Append 



In the second branch of the case, xs must point to a cons cell. The existential 
containing the tail of the list is unpacked and at program point 4, the store has 
shape e (g) {pp {S{2),int,ptr{pxs))} 0 {pxs {S{2),int,ptr{pu))} (g) {pti 
List} (g) {pys e- >• List}. It is now possible to project the tail of xs. To complete 
the loop, the code uses polymorphic recursion. At the end of the second branch, 
the constraint variable e for the next iteration of the loop is instantiated with 
the current e and the contents of location pp, hiding the previous node in the 
list. The location variables pxs and pp are instantiated to reflect the shift to the 
next node in the list. The locations Pyg and ps are invariant around the loop and 
therefore are instantiated with themselves. 

The last problem is how to define the continuation cont' for the next iteration. 
The function should be tail-recursive, so we would like to use the continuation 
cont. However, close inspection reveals that the next iteration of append requires 
a continuation with type Tc[e(g) {pp i— >■ {S (2) , int, ptr{pxs))} , Pxs, Ps] but that the 
continuation cont has type Tc[c, Pp, Ps\- The problem is that this iteration of 
the recursion has unrolled and unpacked the recursive data structure pointed 
to by xs, but before “returning” by calling the continuation, the list must be 
packed and rolled back up again. Therefore, the appropriate definition of cont' is 
canto (rollList pp packing pa;^). Once the continuation packs pxs and rolls the 
contents of location pp into a List, the constraints satisfy the requirements of 
the continuation cont. Semantically, cont' is equivalent to the following function. 
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fix_[- I e(g) {/9p {S{2),int,ptr{pxs))}{pxs List}] 

{start:ptr{ps)). 
rollList Pp packing p^g] 
cont{start) 

However, because coercions can be erased before running a program, it is simple 
to arrange for cont' to be implemented by cont. 



4.2 Deutsch-Schorr- Waite Algorithms 

Deutsch-Schorr- Waite or “link reversal” algorithms, are well-known algorithms 
for traversing data structures while incurring minimal additional space overhead. 
These algorithms were first developed for executing the mark phase of a garbage 
collector m- During garbage collection, there is little or no extra space available 
for storing control information, so minimizing the overhead of the traversal is a 
must. Recent work by Sobel and Friedman has shown how to automatically 
transform certain continuation-passing style programs, those generated by ana- 
morphisms m, into link-reversal algorithms. Here we give an example how to 
encode a link-reversal algorithm in our calculus. 

For this application, we will use the definition of trees from Sect. 0 

Tree = 

p,a.{S{l)) U3[pl,Pr I {pL a}].{S{2),ptr{pL),ptr{pn)) 

Tree' = 

(5(1)) \J3[pl,ph I {pL Tree, PR Tree}].{S{2),ptr{pL),ptr{pR)) 

The code for the algorithm appears in Fig. Q The trick to the algorithm is 
that when recursing into the left subtree, it uses space normally reserved for 
a pointer to that subtree to point back to the parent node. Similarly, when 
recursing into the right subtree, it uses the space for the right pointer. In both 
cases, it uses the tag field of the data structure to store a continuation that knows 
what to do next (recurse into right subtree or follow the parent pointers back 
up the tree). Before ascending back up out of the tree, the algorithm restores 
the link structure to a proper tree shape and the type system checks this is done 
properly. Notice that all of the functions and continuations are closed, so there 
is no stack hiding in the closures. 



5 Operational Semantics and Type Soundness 

In this section, we define the syntax and static semantics of the values manipu- 
lated at run-time, including pointers, memory blocks and the store and give an 
operational semantics for the language. The type system is sound with respect 
to this semantics. 
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letrec walk[e, pi, p 2 \ e ® {pi i— >■ Tree}] ’/, Traverse a tree node 

(t : ptr(pi),up : ptr(p 2 ),cont : Tc[e, pi, P 2 ]). 
unroll (pi); 
case t of 
( ini 

union-r^ee/ (pi); rollTi-ee(pi): con,l(t, up) 

inr 

unpack PL, pR (pi); 

t.l cont\ '/. store cont in tag position 
let left = t.2; 

t.2 := up; 7, store parent pointer as left subtree 

walk[e® 

(pi I-*- ('rc[e,pi,P2],ptr(p2),ptr{pR))}® 

{pR I-4- Tree}, PL, Pi] 

{left, t, rwalk[e, pi, p2, pL, pa])) 

anirwalk[e, pi, p 2 , pL, Pr\ e® 7« Walk the right-hand subtree 
jpi ^ {Tc[e,pi,p2],ptr{p2),ptr(pH)>}® 

{pL i-> Tree}® 

{pR i-d- Tree}] 

{left : ptr{ph),t : ptr{pi)). 
let up = t.2‘, 

t.2 left: % restore left subtree 

let right = t.S; 

t.3 up; % store parent pointer as right subtree 

walk[e® 

|pi 1-^ {'rc[e,pi,P2],P<t(pL),pfr(p2))}® 

(pL i-> Tree},pR,pi] 

{right, t, fini.sh[e, pi, p 2 , PL, Pr]) 

and fini.sh[e, pi, p2, Pl, PR \ e® 

(pi (Tcle,pi,p2],ptr{pL),ptr{p2)}}® % Reconstruct tree node and return 
(Pl Tree}® 

{pR ^ Tree}] 

{right : ptr{pR),t : ptr{pi)). 
let up = t.3; 

t.3 ;= right; % restore right subtree 
let cont = t.l; 

t.l ;= 5(2); 7, restore tag 

packp^ (pi ); union-ivee/ (pi ) ; rollrree (pi ) ; cont{t, up) 
where Tc[e,pi,p 2 ] = 

V[- I e ® {pi HP Tree}].(ptr(pi),ptr(p 2 )) -t 0 



Fig. 7. Deutsch-Schorr- Waite tree traversal with constant space overhead 
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5.1 Run-Time Values 

First, we extend the class of small values to include the junk object junk and 
pointers ptr(£). Next, we define a class of stored values (s) that include memory 
blocks (ui, . . . ,Vn) and witnessed values <^(s). Witnessed values are introduced 
by coercions. There is one witness for each of the roll, U and pack coercions. 

small values v ::= ■ ■ ■ I junk I ptr(£) 
stored values s ::= (ui, . . . , u„) | c(s) 
witnesses c ::== union^,ur2 I pack[^^_,, | 

^all(rec a (A).r) (ci,...,c^) 

The well-formedness of junk and pointers is established using the same judge- 
ment form as other values. Stored values use the judgement h s : r. Since stored 
values only appear at run time, when type and value variables have been sub- 
stituted away, they are always checked in an empty context. Figure El formalizes 
these two judgements. 




• b Wi : ri • ■ ■ h v„ ■■ 

h {vi,...,Vn} ; (ri,...,T„) 



• h ri U T 2 : Type h s : Ti or h s : T 2 
h unionTiUT 2 ('S) : n U T 2 



■ h r = (rec a (A).r') (ci, . . . ,Cn) : Type 
h s : r'[reca (zl).r7o][ci, . . . ,0^1 A] 

h rollT-(s) : T 



A = Pi'.Ki, . . . , Pn-Hn ■ \- a : Ki (for 1 < i < u) 
h S : C[ci, Cn/A] \- s : r[ci, . . . ,c„/Zi] 

h packj^^ _ .,.(s) : 3[Z\ | C].t 



Fig. 8. Static Semantics: Run-time Values 



5.2 Store and Program Typing 

The pack coercion encapsulates a portion of the store, S, which is a finite partial 
mapping from concrete locations to stored values. We treat stores equivalent 
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{S,l) (S,c) 



{S, new p, X, i\ t) 

i 

, ^ 

where £ ^ S,l and s = (junk, . . . , junk) 

{S{£ !->• {vi,. . . ,Un)}, free ptr(£); t) 

{S{£ 1 -^ (vi, . . . , u„)}, let X = ptr(l!).i; r) 
where 1 < i < n 

(5{£ !->• (ui, . . ,Vn)},ptr{£).i := v'\ l) 

where 1 < i < n 

{S{£ 1 —^ s}, caseptr(€) (ini ^ ti | inr ^ L 2 )) 1 — >p {S{£ 1 — s'}, Li) 
i = 1 or 2 

where s = unionT^UTs (?i (• ■ • ?m((5(i), m, . . . , t„}) ■ ■ •)) 
s' = ?i(- ■ • . . .,v„}) ■ ■ ■) 

(5, u(ui, . . . , v„)) I — >P (S, e{i)) 

V = n'[Cl, 

where V' = fix/[Zi I C]{XI\T 1 , . . .,Xn.Tn).t 

9 = [ci, . . . , Cm/H][u'//][tl, . . . , Wn/xi, • ■ ■ , X„] 

(5, coerce( 7 ); t) 1 — >p{S',6(l)) 

where 7 ( 5 ) 1 — >-7 S', 9 



{S{£ ^ s},L[£/p][Tptr{£)/x]) 
iS,L) 

{S{£ 1-^ (ti, . . . , u„}}, i[vi/x\) 



{S{£^ 



Fig. 9. Operational Semantics: Programs 

up to reordering of their elements and use the notation S{£ s| to denote 
the extension of S with the mapping \t 1— >■ s|. The notation is undefined if 
I G Dom{S). The store well-formedness judgement is written h S : C and is 
given below. 

S = {£1 1-^ Si, . . . 1-^ Snj ■ C = {£1 1-^ TI, . . . , £n 1-^ Tn} ■ Store 

h Si : n • • • h Sn '■ Tn 

h S' : G ’ 

To prove that execution of our abstract machine cannot get stuck, we must 
know that there can be no duplication of locations in the domain of the store 
or in any encapsulated storelj We call this property Global Uniqueness and it 
depends upon an auxiliary definition of the Global Store Locations. 

® Alternatively, we could have allowed locations in the store to alpha- convert. We did 
not choose this route because we wanted to show that our operational semantics does 
not implicitly copy portions of the store. Alpha-conversion would have obscured this 
fact since accidental copying of an existential value would have implicitly allocated 
additional store locations. As it stands, accidental copying of existentials would 
create duplicate store locations, which Global Uniqueness asserts never happens. 
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Definition 1 (Global Store Locations). L(5') is the multi-set given by the 
following definition. 

L({£i si, s„}) = {ii,. l±l L(si) l±l • • • WL(s„) 

L(pack[^j,...,c„|s]asr(s)) = L(5') W L(s) 

L(x) = L(xi) 1+1 • • • 1+1 L{Xn) 

for any other term construct x 

where X \, . . . are the subcomponents of x. 

Definition 2 (Global Uniqueness). GU(S') if and only if there are no dupli- 
cate locations in L(5'). 

A program is a store paired with an instruction stream. A program is well- 
formed, written h under the following circumstances. 

Definition 3 (Well- formed Program), h {S,l) iff 

1. The store adheres to global uniqueness W(S). 

2. There exists constraints C such that \~ S : C . 

3. The instructions are well-formed with the given constraints: •; C; • h t. 

5.3 Operational Semantics 

The small-step operational semantics for the language is given by a function 
P I — >p P' . The majority of the operational rules are entirely standard and 
formalize the intuitive rules described earlier in the paper. The operational rule 
for the coerce instruction depends upon a separate semantics for coercions that 
has the form S i — S\ 9 where 0 is a substitution of type constructors for type 
constructors variables. Inspection of these rules reveals that coercions do not 
alter the association between locations and memory blocks; they simply insert 
witnesses that alter the typing derivation so that it is possible to prove a type 
soundness result. The rules for program and coercion operational semantics may 
be found in Figures Eland cni 

5.4 Type Soundness 

We now have all the pieces necessary to state and prove that execution of a 
program in our language “can’t get stuck.” A stuck program is a program that 
is not in the terminal configuration halt i and for which no operational rule 
applies. 

Theorem 1 (Type Soundness). 

//h (S', t) and (S,l) i — >*p (S',d) then is not stuck. 

The proof itself uses standard Subject Reduction and Progress lemmas in the 
style popularized by Wright and Felleisen m and is mostly mechanical. Due to 
space limitations, it has not been included. See Walker’s thesis m for details. 
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7 (^) s',e 

imion^j^uxj !->■ s}) i — >y S{^ >->■ union,- j^Ux2 (»)}. [] 

roll,-(£)(S{£ I— i s}) I — >y S{£ roll,-(s)}, [] 

unroll(£)( 5 {£ i— i roll,-(s)}) i — >y S{£ i— >■ s}, [] 

pack[oj c„|c]asT(^)(‘S'{^ '->■ «}'S') ' — >-7 S{e i-i pack[^^ o„|s']«t(«)}. H 

where C — {£i e- ^ ri , . . . , } and S' — {^i e- ^ si , . . . , Sm} 

unpack /I (f)(S{f e-i pack[„^^ ,.^|s,]^^g[2i|c].x(«)}) ' — >-7 SS'{f e-i s}, [ci, . . . ,c„/Zl] 

Fig. 10. Operational Semantics: Coercions 



6 Discussion 

Our alias type system has intermediate expressiveness when compared with other 
frameworks for reasoning about aliasing. It is more powerful than simple k- 
limiting approaches HSl as it provides the ability to represent recursive data 
structures, but much less powerful than recent approaches based on Hoare logic 
|d'2l I Yj . In the following sections we explain some of the limitations of this work 
and discuss related research in more depth. 



6.1 Limitations 

There are at least four significant limitations of the work described in this paper: 

1. A lack of may-alias constraints. 

2. Limited support for union types. 

3. Limited logic (no implication, disjunction, negation or equality predicates) 
for compile-time reasoning about store shapes. 

4. Limited coercions. 

The first limitation refers to the fact that no element in a store type may alias 
any other element of the store type. This property is what makes deallocation 
safe, but it also prevents us from writing many useful functions. For example, it 
is impossible to write a function of two or more arguments where each argument 
may or may not alias the others. It is also not possible to write a general graph 
or DAG in this language. In earlier work with Fred Smith m. we describe how 
to add may-alias constraints to a simpler language and we have done this in our 
Typed Assembly Language implementation j^. The disadvantage of these may- 
alias constraints is that destructive object manipulation, including deallocation, 
is disallowed. 

The second limitation refers to the fact that the introduction and elimination 
rules for union types limit a programmer’s choice of data layout. Unions always 
have the form: 
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3[Zii |Q].(5(l),ri,...,Tfe) 

U 

The only way to eliminate a union is to case on the first component of the me- 
mory block. There are many other ways to represent sum types: We may want 
to separate the union tag from the data structure itself or we may not want a 
tag at all, relying instead on pointer equality tests or the ability to differentiate 
between nil and a pointer. In general, to separate the tag from the union type 
requires more dependency mechanisms than we have in this language. However, 
we can accomodate some additional data type representations by simply adding 
a new introduction and elimination forms — the type structure need not change 
at all. For instance an option type that is either a nil object or reference to r 
may be encoded as 5(0) U (r). A test for zero eliminates this form of union. 
Regardless of which additional choices we make here, we cannot represent com- 
pletely undescriminated unions; we must use some dynamic test to determine 
which element of a union a value inhabits before using that value. 

The third limitation provides some interesting possibilities for future rese- 
arch. In this work, we have developed a very simple, special-purpose logic for 
reasoning about the store. Because our logic is so weak (it contains no implica- 
tion, negation or quantifiers among other possibilities), there are many properties 
of the store, such as the property of being a “balanced tree,” that we cannot 
express. One avenue for improving the expressive power of our type system is 
to follow the path laid out by Xi and Pfenning’s DML ptt)l40j . They augment 
a functional programming language with a general-purpose logic for specifying 
program properties. The disadvantage of such an approach is that we would have 
to integrate a theorem prover into our language in order to decide the validity 
of the logical formulae. 

Finally, the coercions we provide (e.^., fold and unfold) are sufficient to do 
some programming tasks, but are by no means complete. 



6.2 Related Work 



Our type system builds upon foundational work by other groups on syntactic 
control of interference m linear logic pni and linear type systems in functional 
programming languages 121 >14 21 1 1 1 hl.'Sl^iitn] . 

Our research also has much in common with efforts to define program logics 
for reasoning about aliasing |6tll26l,Til17| . In particular, if we view propositions 
as types, there are striking similarities with recent work by Reynolds who 
builds on earlier research by Burstall [0|. Reynolds’ logic employs a “spatial 
conjunction,” which, like our ® operator, joins two operands that depend upon 
disjoint portions of the store. Updating a single memory cell can alter at most 
one of the propositions joined by Reynolds’ conjunction, making it possible to 
state simple Hoare-logic rules for memory allocation, dereference and update. 
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Ishtiaq and O’Hearn im have further analyzed Reynolds’ rules in the context 
of the logic of bunched implications. They give a slightly different set of Hoare 
rules and are able to prove that their rules generate weakest preconditions. They 
also introduce an operation for safe object deallocation. As mentioned earlier, 
one practical difference between these program logics and our type system is 
that for our system there is no need to implement a theorem prover to check the 
safety of programs. Consequently, we have found it straightforward to integrate 
alias types with our Typed Assembly Language implementation PH|. 

There are also similarities between our research and work on alias analysis 
techniques for imperative languages jlSI21llili2i;i;ij . Our type system appears 
most closely related to the shape analysis developed by Sagiv, Reps, and Wilhelm 
(SRW) jSni, which has also been used to develop sophisticated pointer logics 
2j. Although the precise relationship is currently unknown to us, it is clear that 
several of the key features that make SRW shape analysis more effective than 
similar alias analyses can be expressed in our type system. More specifically: 



1. Unlike some other analyses, SRW shape nodes do not contain information 
about concrete locations or the site where the node was allocated. Our type 
system drops information about concrete locations using location polymor- 
phism. 

2. SRW shape nodes are named with the set of program variables that point to 
that node. Our type system can only label a node with a single name, but 
we are able to express the fact that a set of program variables point to that 
node using the same singleton type for each program variable in the set. 

3. SRW shape nodes may be flagged as unshared. Linear types account for 
unshared shape nodes. 

4. A single SRW summary node describes many memory blocks, but through 
the process of materialization a summary node may split off a new, separate 
shape node. At least some summary nodes may be represented as recursive 
types in our framework and materialization can be explained by the process 
of unrolling and unpacking a recursive and existential type. 

One of the advantages to our approach is that our language makes it straightfor- 
ward to create dependencies between functions and data using store or location 
polymorphism. For example, in our implementation of the Deutsch-Schorr- Waite 
algorithm, we manipulate continuations that know how to reconstruct a well- 
formed tree from the current heap structure and we are able to express this de- 
pendence in the type system. Explicit manipulation of continuations is necessary 
in sufficiently low-level typed languages such as Typed Assembly Language m 
Several other authors have considered alternatives to pure linear type sy- 
stems that increase their flexibility. For example, Kobayashi HH! extends stan- 
dard linear types with data-flow information and Minamide m uses a linear 
type discipline to allow programmers to manipulate “data structures with a 
hole.” Minamide’s language allows users to write programs that are compiled 
into destination-passing style. However, Minamide’s language is still quite high- 
level; he does not show how to verify explicit pointer manipulation. Moreover, 
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neither of these type systems provide the ability to represent cyclic data struc- 
tures. 

Tofte, Talpin, and others have explored the use of region-based me- 

mory management. In their work, objects are allocated into one of several regions 
of memory. When a region is deallocated, all the objects in that region are de- 
allocated too. Region-based memory management performs extremely well in 
many circumstances, but unlike systems based on linear types, space is not, in 
general, reused on a per-object basis. Moreover, regions cannot be encapsulated 
inside recursive data structures. Recently, Crary, Walker and Morrisett nm in- 
vestigated an alternative region type system that reasons about aliasing between 
regions using similar technology as we use here to reason about aliasing between 
individual objects. We believe that some of the techniques developed in this pa- 
per will make it possible to capture regions in recursive data structures and we 
are eager to investigate a combined region-alias type framework that can take 
advantage of both forms of typed memory management. 

Acknowledgements. Fred Smith worked with us on the predecessor to this 
research and the many stimulating discussions we had together contributed to 
the current paper. Neal Glew and the anonymous reviewers for TIC’OO made 
many helpful comments on an earlier draft of this paper. 
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A Type Well-Formedness and Equality 



\ AP c: k\ 



AP (3 \ A{l3) AP I \ Loc Z\ h 0 : Store 



AP C \ Store AP rj : Loc AP t : Type 
A P C (E) {rj i-T- t} : Store 



AP C : Store AP e : Store 
AP C ® f. Store 



A h junk : Type 

A P T] : Loc 
A P ptr{rj) : Type 



A P int : Type Z\ h iS(t) : Type 

Z\ h Ti : Type ■ ■ ■ Z\ h r„ : Type 
Z\ h (n, . . . ,r„) : Type 



Z\ h Ti : Type Z\ h T 2 : Type 
h Ti U T 2 : Type 



A, A' P C Store 

A, Z\' h Ti : Type • • • A, A' P Tn ■ Type 
{Dom{A) n Dom(A') = 0) 

A P V[Z\' I C'].(ti, . . . ,t„) 0 : Type 



A, A' P C : Store Z\, Z\' h r : Type 
{Dom{A) n Dom(A') = 0) 

A P 3\A' I C].T : Type 



A, a:{Dom{A')) — >■ Type, Z\' h r : Type 
{Dom{A) n Dom{A') = 0) 



A h rec a {A').t : {Dom{A')) — >■ Type 
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A h c : (ki, . . . , Kn) — >■ Type A \~ ci : Ki ■ ■ ■ A h- c„ : Kn 
A h c(ci, . . . ,c„) : Type 



Z\ h Oi = 02 : Atom 






Ah rj 


: Log Z\ h t = t' : Type 


Z\ h e : Store 


Ah {rj^ 


r} = {?7 1— >■ t'} : Atom 


Z\ h e = e : Atom 


Ah Cl = C2 '. K 









Ah c: K Ah C 2 = Cl : K 

Ah c = c : K A h Cl = C 2 '■ K 



Ah Cl = C2 : K Ah C2 = C3 : K 
Ah Cl = C 3 : K 



Ah ai = a'l : Atom • • • Z\ h a„ = a" : Atom 
a'l,. . . ,a'^ is a permutation of a", . . . , a" 

Z\ h 0 0 oi 0 • • • 0 a„ = 0 0 0 • • • 0 ajj : Store 

Ah ti = t'i'. Type ■ ■ ■ Z\ h r„ = < : Type 
Ah {ti, ... ,Tn) = {t'i, ... ,t'^) ■. Type 

Ah Ti = t'i '. Type Ah T2 = T2 '. Type 
Z\ h Ti U T 2 = t( U T 2 : Type 



A,A'hC=C : Store 
A, A' h ti=t[: Type 

A, Z\' h T„ = < : Type 
{Dom{A) n Dom{A') = 0) 

A h V[Z\' I C'].(n, . . . , T „) 0 = 

V[Z\' I . . . ,0 0 : Type 



A, A' h C = C '. Store A^ A' h t = t' : Type 
{Dom{A) n Dom{A') = 0) 



Zi h 3[Z\' I C].T = 3[Z\' I C'].t' : Type 
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A, a:(Dom(A')) —>■ Type, A' h r = t' : Type 
{Dom{A) n ({a} U Dom{A')) = 0) 

A h rec a {A').t = rec a {A').t' : {Dom{A')) Type 

Z\ h c = c' : (ki, . . . , Kn) — >■ Type 
Z\ h Cl = 4 : • • • A\- Cn = c'n : Kn 

A\- c{ci,. .. ,Cn) = c' c'n) : Type 
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